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CHAPTER 1 
INTRODUCTION 


1.1 INTRODUCTION AND RELATED DOCUMENTATION 
This manual is a comprehensive description of the FP785 Floating-Point Accelerator and is designed as a 
training and field resource. Table 1-1 lists related hardware manuals. 


Table 1-1 Related Hardware Manuals 


Document 
Title Number 
VAX-11/785 Central Processor Unit 
Technical Description EK-KA785-TD 
VAX-11/785 TB, Cache and SBI Control 
Technical Description EK-MM785-TD 
VAX-11/780 Power System Technical Description EK-PS780-TD 
VAX-11/785 Installation Manual EK-SI785-IN 
VAX-11/785 Hardware User’s Guide EK-11785-UG 
VAX Diagnostic User Guide EK-VX11D-UG 
VAX Maintenance Handbook, VAX Systems EK-VAXV1-HB 
VAX Maintenance Handbook, VAX-11/780 EK-VAXV2-HB 
VAX Hardware Handbook, 1982-83 EB-21710 
VAX Architecture Handbook, 1981 EB-19580 
DS780 Diagnostic System User Guide EK-DS780-UG 
DS780 Diagnostic System Technical Description EK-DS780-TD 
System Maintenance Print Set MP01747-** 
CPU Maintenance Print Set MP01749-** 
FPA Maintenance Print Set MP01750-** 


**The latest revision is sent if no revision is specified. 
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Hard-copy documents may be ordered through the nearest DIGITAL sales office, the Accessories and 
Supplies Group catalog, or from one of the following sources. 


Hardware documents: 


Digital Equipment Corporation 

Publishing and Circulation Services, NRO3-1/W3 
10 Forbes Road 

Northboro, MA 01532 


Handbooks and software manuals: 


Digital Equipment Corporation 

Software Distribution Center, NRO2-1/J6 
444 Whitney Street 

Northboro, MA 01532 


Educational Services Distribution Center 
Digital Equipment Corporation 

12A Esquire Rd 

Brookside Industrial Park 

North Billerica, MA 


Technical and service documents that support the VAX-11/785 system (including maintenance print sets 
and diagnostic listings) are also available on microfiche. For information on microfiche libraries contact: 


Digital Equipment Corporation 
Micropublishing Systems, FPO/B5 
30 North Avenue 

Burlington, MA 01803 


1.2 GENERAL DESCRIPTION 

The FP785 Floating-Point Accelerator (FPA) is a hardware option available on the VAX-11/785 comput- 
er system. This option, functioning in conjunction with the KA785 central processor, speeds the execution 
of floating-point arithmetic instructions. This option overrides the CPU floating-point microcode and uses 
dedicated hardware to execute the instructions faster. Some FPA operations overlap CPU operations. This 
allows the CPU to proceed with other tasks while the FPA completes the floating-point instruction. This 
overlap helps to speed program execution. The operation of the FPA is transparent to both macrolevel 
software and main-machine microcode. The FPA also speeds the execution of some integer arithmetic 
instructions. The FPA can handle both single- (float) and double-precision data. 


The FPA can handle a wide range of numbers. A floating-point number between —1.7 X 1038 and 1.7 
1038 can be represented. The smallest floating number the FPA can represent is +.29 X10°38. A single- 
precision number is accurate to about 7 decimal digits; a double-precision number to about 16 decimal 
digits. The FPA can also handle 32-bit signed integers from —2,147,483,648 to 2,147,483,647 inclusive. 


The FPA ts a microprogrammed device operating as a synchronous extension of the CPU data path. Both 
the FPA and CPU operate using a 133.3 ns microcycle; FPA TO coincides with CPU TO. As an extension 
of the CPU, the FPA does not access memory data. The CPU must do memory address calculations, 
access the calculated address, and transmit the accessed data to the FPA. The CPU is also responsible for 
fetching and storing the FPA results. The FPA performs only the required floating-point or integer 
operation on the properly formatted operands transmitted to it. 
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The FPA can do floating-point addition, subtraction, multiplication, and division instructions. It receives a 
packed, normalized floating-point number containing a sign bit, fraction bits, and exponent bits. The FPA 
breaks the number up into parts and FPA data manipulation sections perform the operations required on 
each part to carry out the floating-point instructions. Once the result is completed, it normalizes and packs 
the result for return to the CPU. Refer to the simplified diagram of the FPA (Figure 1-1). 
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Figure 1-1 The FPA 


Accelerator Interface 

The FPA is an optional hardware extension of the CPU data path. It is the first of a series of optional 
accelerators that can be plugged into slots 24 through 28 of the CPU backplane. To facilitate design of 
these optional accelerators, a set of standard interface signals and buses is used to transfer data and control 
information. 


Two copies of the CPU general register set are kept in the FPA. These are read only memory to the FPA 
and provide rapid access to register operands when used in instructions. Every time the CPU general 
registers are updated, a copy of the updated data is transmitted via the DFMX bus to the FPA. 


All other data (memory and literal) is transmitted to the accelerator via the ID bus. Memory data is 
transferred into the CPU D register and then onto the ID bus. Literal data is transferred from the 
instruction buffer via the ID bus. 


All op codes are received from the instruction buffer. The FPA uses dedicated hardware to handle certain 
op codes. The op codes are decoded and, if they are part of the FPA-implemented set, processing is 
started. 


FPA results are returned to the CPU via the DFMX bus. Any transfer of data (either operands or results) 
between the CPU and FPA is controlled by the CPSYNC and FPSYNC. CPSYNC is transmitted via the 
CS bus. When an operand is transferred to the FPA, CPSY NC asserted (by the CPU) indicates that data 
is available on the ID bus and FPSYNC is asserted (by the FPA) to indicate that data has been received. 
When the FPA is returning a result, FRSYNC indicates result available and CPSYNC indicates result 
received. When a result is transferred, the FPA also transmits the proper condition codes to the CPU. 


Traps and errors are handled with three signals: ACC ERROR (from FPA to CPU), FP TRAP (CPU to 
FPA), and ACC TRAP (CPU to FPA). ACC ERROR (also called ERRSYNC) is asserted when the FPA 
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detects an internal error and is input to the CPU BEN multiplexer. FP TRAP is used by the CPU to 
initiate microdiagnostics stored in the FPA. ACC TRAP selects either the power-up trap or the abort trap 
(both stored in the FPA microcode). 


1.3. FPA INSTRUCTION SET 

The FPA handles only a limited number of instructions (refer to Table 1-2). No floating-point instructions 
are available in PDP-11 compatibility mode. As shown in the table, the FPA handles single- and double- 
precision instructions in both 2- and 3-operand formats. The FPA handles the single- and double-precision 
instruction variations internally. However, as stated before, the FPA does no memory accessing. This 
means the CPU must do all address calculations and accessing for any input operands stored in memory. 
Also, the FPA does not store any final results; it merely makes the results available to the DFMX bus. The 
CPU must enable the result onto the DFMX bus, determine the result destination, and put the result into 
the destination. In a three-operand instruction, the FPA begins computing as soon as it has the two source 
operands while the CPU is computing the third (destination address). 


Table 1-2. FPA Instruction Set 


Mnemonic Description 

ADDF* Add single-precision floating-point 

ADDD* Add double-precision floating-point 

SUBF* Subtract single-precision floating-point 

SUBD* Subtract double-precision floating-point 

MULF* Multiply single-precision floating-point 

MULD* Multiply double-precision floating-point 

DIVF* Divide single-precision floating-point 

DIVD* Divide double-precision floating-point 

POLYF Evaluate polynomial single-precision floating-point 
POLYD Evaluate polynomial double-precision floating-point 
EMODF Extended single-precision floating-point 

EMODD Extended double-precision floating-point 

MULL* Multiply integer longword 


*The FPA instruction set includes both the 2-operand and 3-operand format of these instructions 


1.4 PHYSICAL DESCRIPTION 

The FPA consists of 5 hex-height, extended-length modules containing mostly FAST TTL logic. They 
replace blank modules 7014103 in slots 24 through 28 of the KA785 backplane. These slots are designated 
as the accelerator option slots. The FPA is powered by an H7100 power supply installed in power supply 
position 1. When viewed from the front, position 1 is the left-most location in the VAX-11/785 CPU 
cabinet. Position 1 is left empty if an accelerator is not installed. The H7100 power supply isa 5 V, 100 A 
supply. Refer to Figure 1-2 for location of backplane slots and power supply. Refer to Table 1-3 for 
module designations and locations. 


1.5 FLOATING-POINT NUMBERS AND ARITHMETIC 

This section discusses some fundamentals of floating-point numbers and arithmetic. It provides useful 
background for more advanced topics in later sections. The reader already familiar with floating-point 
may skip this section. 
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1.5.1 Integers 

All data within a computer system could be represented in integer form. The numbers that can be 
represented in a 32-bit machine range in magnitude from 0000000016 to FFFFFFFF16(or from 019 to 
4,294,967,295). However, integer form imposes some limitations. Only whole numbers can be represented, 
that is, no fraction or decimal parts. As a result, this imposes an accuracy limitation. Furthermore, 
numbers greater than 4,294,967,295 cannot be represented. This, in turn, imposes a range limitation. 


These limitations are imposed by the stationary position of the radix point (for example, the decimal point 
in base 10 notation or the binary point in base 2 notation). An integer’s radix point is usually omitted in 
integer representation because it always marks the integer’s least significant place. That is, there are never 
any digits to the right of an integer’s radix point. For this reason, an integer is sometimes called a fixed- 
point number. 


Integer notation, however, can be modified to overcome the range and accuracy limitations imposed by the 
fixed radix point. This is done through the use of floating-point notation. 
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Figure 1-2 FPA Physical Location 
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Table 1-3. FPA Modules 


Module No. Module Name Module Function 


M7540 24 Normalization and fraction division 

M7541 25 Fraction multiplication (most significant 
bits) 

M7542 26 Fraction multiplication (least significant 
bits) 

M7543 27 Fraction addition and subtraction 

M7544 28 Exponent manipulation and FPA control 


1.5.2 Floating-Point Numbers 

Floating-point numbers, unlike integers, have no position restrictions imposed on their radix points. A 
popular type of floating-point representation is called scientific notation. With scientific notation, a 
floating-point number is represented by some basic value multiplied by the radix raised to some power. 


Example 
basic 
value 
exponent 
1,000,000 = 1.0 xX 106 
radix 


There are many ways to represent the same number in scientific notation, as shown in the following 
example. 


Right shifts Left shifts 

512 = 512. xX 10° 512 = 512 xX 10° 
= 51.2 xX 10! = 5120 X 107! 
= 5.12 xX 10? = 51200 X _ 107 
= 512 X 10° = 512000 xX 10° 
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The convention chosen for representing floating-point numbers with scientific notation in the FPA requires 
the radix point to always be to the left of the most significant digit in the basic value (such as .512 X 103 in 
the above example). This modified basic value is called a fraction. 


Notice that for each right shift of the basic value, the exponent is incremented; for each left shift the 
exponent is decremented. The value of the number remains constant if the exponent is adjusted for each 
shift of the basic value. 


More examples of scientific notation follow. 


Decimal Decimal Binary Hex Hex 
Notation Scient. No. Notation Notation Scient. No. 
64 64 X 102 1000000. 4016 4X 16-2 
33 33 X 102 100001. 2116 21 X 16-2 
1 /2(.5) 5 X 109 0.1 816 8 X 160 
3/32(.09375) 9375 X 10°! 0.0001 1 1816 18 X 16° 


1.5.3. Decimal/Binary/Hexadecimal Conversion 

There are standard routines to convert from decimal notation to hexadecimal (also called hex) and back. 
When converting from either decimal to hex or hex to decimal, it is convenient to first convert to binary 
notation and then to the final notation. 


1.5.3.1 Decimal to Hex Conversion — To convert a decimal number with both integer and fraction 
portion to a hex number, the integer and fraction are separated and converted individually. The integer is 
converted to binary by a repeated division technique and the fraction by a repeated multiplication 
technique. The resulting binary number is then converted to hex. 


To convert an integer to binary representation, the integer is divided by two. The remainder of this division 
(either 1 or 0) becomes the LSB of the binary representation. The result of this division is again divided by 
two. The remainder of this division goes to the left of the LSB, becoming “next to LSB’. The result is 
divided again. This process is continued until the result is zero. Refer to Example 1. 


A repeated multiply-by-two converts a decimal fraction to a binary fraction. The decimal fraction is 
multiplied by two. If the result is 1.0 or more, a | is placed in the MSB of the fraction (directly to the right 
of the binary point); if less than 1.0 a 0 is placed there. The fraction portion (only of this result) is again 
multiplied by two. If the result is 1.0 or more, a 1 goes to the right of the MSB; if less than 1.0, a 0. This 
continues until the fraction portion of the result is all zeroes (refer to Example 2) or until enough binary 
fraction bits have been generated to represent the decimal accurately enough (see Example 3). Note that 
finite length decimal fractions can become repeating fractions in binary (Example 3). 
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Example 1 Convert 197j9 to binary 


STEP 1 98 R 1 1100 0101 


2) 197 
STEP 2 49 
2) 98 
STEP 3 24 
2) 49 
12 
2) 24 
STEP 5 6 R 
2) 12 


STEP 6 3 R 


2) 6 


STEP 7 1 R 


STEP 4 


STEP 8 O R 
2 1 


19749 = 1100 01014 
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Example 2 Convert 3/8 (.375) to binary 


STEP 1 375 011 
| 
(0) .750 —» 0 
STEP 2 75 
2 
G4) 50 —1 
STEP 3 -50 
2 
GQ) .00 —1 


TK-0655 
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Example 3 Convert .603;9 to binary 


1001 101 
STEP 1 —.603 
2 
G) .206 ———+1 


STEP 2 .206 
2 


(0) .412 ———+0 


STEP 3 412 
2 


(0) .g24 ————+0 


STEP 4 824 


STEP 5 .648 


STEP 6 .296 


(0) 592 ———+0 


STEP 7 592 
2 


(1) 184 ———+1 


DECIDE TO STOP 60349 = .1001 1015 


TK-0656 
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The conversion from binary to hex is very simple. Starting at the binary point, break the binary number 
into groups of four digits each (zero fill at both right and left ends to complete groups of four). Then 
replace each group of four with its hex equivalent. Refer to Table 1-4 and Example 4. 


Table 1-4 Binary-Hex Equivalents 


0 
l 
2 
3 
4 
5 
6 
7 
8 
9 
A 
B 
© 
D 
E 
F 


Example 4 Convert 110010110.101101, to Hex 


1. Break into groups of four and zero-fill left and right ends. 


Zeros Zeros 
Added Added 


0001 1001 0110.1011 0100 
Nee Nee Nee eye ee 
4 4 4 4 «4 


2. Replace four digit groups with hex equivalents. Refer to Table 1-4. 


0001 1001 0110.1011 0100 


4 ++ FF + 4Y 
1 9 6. B 4 


196.B4,, 
1 1001 0110.1011 01,=196.B4;, 
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1.5.3.2 Hex to Decimal Conversion — To convert from hex back to decimal, first replace each hex digit 
with its 4-bit binary equivalent (see Table 1-4). Each position in a binary number has a positional value 
based on which side of the binary point it is and its distance from the binary point. The positional values 
are based on powers of two. The bit in the unit column has a positional value of one. The positional value 
doubles each time a move from right to left takes place, and halves on a left to right move. Refer to Figure 
1-3 for a summary of binary positional values in both powers of two and decimal value. 


To convert from binary notation to decimal notation, add the decimal positional value of each bit that is a 
one. This sum is the decimal equivalent of the binary number. 


wee 2?) = 28 9 94 «93 22 21 20 9192 23 24 95 26... 
128 64 32 14 8 4 2 1 % % 1/8 1/16 1/32 1/64 
5 .25 .125 


015625 
03125 


0625 
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Figure 1-3 Positional Value of Binary Number 


1.5.4 Normalization 

As discussed previously, there are many ways to represent a particular floating-point number using 
scientific notation. The convention chosen for representing VAX-11 floating-point numbers requires the 
radix point to be to the left of the most significant bit in the basic value. Refer to Example 5. 


Example 5 Floating-Point Form 


29,5 = 11101, = 11101. X 29 = 11101. x 29 
1110.1 xX 2) = 111010. xX 2 

111.01 X 22 = 1110100. xX 2° 

11101 11.101 X 2 = 11101000. xX 2° 
Fraction 1.1101 X 27 = 111010000. xX 24 
: erosen _» 1110 1 xX 28 = 1110100000. x 2° 
Exponent 011101 X 26 = 11101000000. x 2 
0011101 X 27 = 111010000000. xX 27 


The process of ensuring that the first significant bit is directly to the right of the binary point is called 
normalization. If the number is one or larger, it involves right-shifting the basic value and incrementing 
the exponent until the MSB (a one) is directly to the right of the binary point. If the number is a fraction 
with leading zeroes, the basic value is left-shifted and the exponent is decremented. Examples 6 and 7 
show conversion of numbers to VAX-11 normalized form. 


Example 6 Convert 75,9 to a normalized binary number 


1. Integer conversion 
7519 = 100 10112 


2. Floating-point form 
100 10112 = 100 10112 x 20 


3. Normalized form 
Right shift fraction 7 times 
Increment exponent by 7 


100 10112 X 29 = .100 1011 x 2/7 


Fraction = .100 1011 
Exponent = 7 


Example 7 Convert 3/16 (.01875) to a normalized binary number. 


1. Integer conversion 
0187539 = .0O112 


2. Floating-point form 
00112 = .00112 x 20 


3. Normalized form 
Left shift fraction 2 times 
Decrement exponent by 2 


O11, X 29 = 11 X 2-2 


Fraction = .11 
Exponent = -2 


1.5.5 VAX-11 Floating-Point Notation 

Two conventions are used in the FPA to conserve memory space without losing accuracy and to aid in 
hardware manipulation. The first convention is called the hidden bit. All numbers transferred between the 
CPU and FPA are normalized floating-point numbers. This means the first significant bit, always a 1, is 
always directly to the right of the binary point. To conserve memory space and data lines, the first 
significant bit is neither stored nor transmitted to the FPA. For example, the fraction part of the 
normalized binary number .11000... x 2-2 is stored and transmitted to the FPA as 100.... The normalized 
fraction of 1/2 (.100... x 2°) is stored and transmitted as 000... . In both cases the first 1, the hidden bit, is 
added by hardware in the FPA. When the FPA transfers a normalized answer back to the CPU the hidden 
bit is not sent. 


The 8-bit exponent portion of a floating-point number is stored using excess 8016 notation. This notation 
simplifies the hardware that manipulates the exponent during floating-point arithmetic operation. Excess 
8016 exponent notation is obtained by adding 10000000 (200g, 80 6, or 128)0) to 2’s complement 
notation. 


Refer to Section 1.6 for a further discussion of excess 80 notation. 
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1.5.6 Floating-Point Addition and Subtraction 

In order to perform floating-point addition or subtraction, the exponents of the two floating-point numbers 
involved must be aligned or equal. If they are not aligned, the fraction with the smaller exponent is shifted 
right until they are. Each shift to the right is accompanied by an increment of the associated exponent. 
When the exponents are aligned, the fractions can then be added or subtracted. The exponent value 
indicates the number of places the binary point is to be moved to obtain the integer representation of the 
number. 


In Example 8, the number 719 is added to the number 40)0 using floating-point representation. Note that 


the exponents are first aligned and then the fractions are added. The exponent value dictates the final 
location of the binary points. 


Example 8 Floating-Point Addition 
0.1010 0000 0000 000 x 26 = 2816 = 40j0 
+0.1110 0000 0000 000 xX 2 = 716 = Tio 


1. To align exponents, shift the fraction with one smaller exponent three places to the right and 
increment the exponent by 3, and then add the two fractions. 


0.1010 0000 0000 000 X 26 = 2816 = 40;0 

+0.0001 1100 0000 000 X 2° = 716 = 719 
we 

0.1011 1100 0000 000 x 26 = 2Fig = 4710 


2. To find the integer value of the answer, move the binary point six places to the right. 


010 1111.0000 0000 0 
NL 


1.5.7 Floating-Point Multiplication and Division 
In floating-point multiplication, the fractions are multiplied and the exponents are added. For floating- 
point division, the fractions are divided and the exponents are subtracted. There is no requirement to align 
the binary point in the floating-point multiplication or division. Example 9 shows floating-point multiplica- 
tion and Example 10 shows floating-point division. 


1.6 EXCESS 80 NOTATION 

The VAX family of computers, including this FPA option, uses excess 80 notation to store and handle the 
exponent portion of floating-point numbers. Excess 80 notation is the 2’s complement of exponent plus 
128109 or 80j6. 


It is convenient to handle the exponent portion of the floating-point number in 2’s complement notation. 
This allows a wide range of both positive and negative exponents to be represented. However, in 2’s 
complement notation an overflow must occur to go from the least negative number to zero. To avoid this, 
the bias of 128)9 is added to the 2’s complement number. 


Example 9: 
Multiply 7,9 by 40,0. 


l. 0.1110000 xX 2° 
X0.1010000 X 2° = 


1110000 
0000 
11100 


.1000110000 X 2? (Result already in normalized form.) 


| il 
No ~ 
Co 
Oo 
to 
© 
BSS 
© 
— 
© 


2. Move the binary point nine places to the right. 
100011000.00000 = 118,, = 280;9 


Example 10: 


Divide 1510 by S10: 


1. .1111000 X 2% 
.1010000 X 2° 


1.100000 

1010000 )1111000.000000 
1010000 
101000 
101000 


0 


2. Exponent: 4-3 = 1 

3. Result: 1.100000 x 2! 
Normalized Result: .1100000 x 22 
Normalized Fraction Normalized Exponent 
Move binary point two places to the right. 


J100000 = 316 = 310 


Historically, minicomputers have been discussed and explained using octal notation. In octal, the bias of 
12819 is 200g. In previous manuals this exponent notation has been discussed using octal form. As a result, 
it is called excess 200g or excess 200. However, the VAX-11 computer is discussed using hexadecimal 
notation. Unfortunately, when discussing the excess 80 bias in VAX-11 documentation, it has been called 
not only 8016, but also 128,09, 200g, and 100000002. (To further complicate matters, sometimes the base 
is indicated, sometimes it is not.) When studying the FPA print sets, technical manuals, and microcode 
listings, be aware of this variation in terminology. In this manual hex notation is used and the exponent 
bias is called excess 80. 


When multiply and divide operations are performed using floating-point numbers with excess 80 exponent 
notation, the resulting exponent must be adjusted by the bias to return the result to excess 80 notation. 
When a multiplication is performed, exponents are added, and 80)¢ must be subtracted from the result to 


return it to excess 80 notation. To understand why 80 must be subtracted from the exponent calculation 
during multiplication, consider the following. 


Exponent A + 80 
Excess 80 notation 


Exponent B + 80 
Exponent A + Exponent B + 100 
Both exponent A and exponent B are biased by 80, yielding a bias of 100. However, only a bias of 80 is 
desired in excess 80 notation. 
Multiplication Example 
2X3=6 
Fraction Exponent 


2=0.100 X 82 
3=0.110 X 82 


Fraction Calculation Exponent Calculation 
2 = 0.100 82 
3=0.110 +82 
1000 104 
100 ~80 
6 = 0.011000 X 84 


Normalize the fraction by left-shifting one place and decreasing the exponent by 1. 


Fraction Exponent 


0.11000 X 83 = 6 


When a division is performed, exponents are subtracted, and 80;gmust be added to the result to return it 
to excess 80 notation. To understand why 80 must be added to the exponent calculation during division, 
consider the following. 


Exponent A + 80 
~ Exponent B + 80 


Exponent A - Exponent B + 80-80 = Exponent A - Exponent B+0 


However, since the result is to be in excess 80 notation, 80,6 must be added to the exponent, yielding 
Exponent A — Exponent B + 80. 


Division Example 


16/4=4 
Fraction Exponent 
16 = .10000 X 85 
4 = .10000 X 83 
Fraction Exponent 
Calculation Calculation 
1.000 85 
O 10000. 0.10009.000 = 
+80 
82 
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Normalize the fraction by right-shifting one place and incrementing the exponent. 


Fraction Exponent 


10000 X 83=4 


BLANK 


CHAPTER 2 
FUNCTIONAL DESCRIPTION 


2.1 INTRODUCTION 

This chapter explains the algorithms used in the FPA. Section 2.2 discusses the various types of data 
formats that may be handled by the FPA. Section 2.3 lists the various instructions the FPA can do and 
explains the FPA operations required to perform each operation. The operation of the FPA is based on 
instruction flow. 


2.2 DATA FORMATS 

The FPA handles single- and double-precision floating-point data and signed integer longwords. It receives 
normalized, packed data from the CPU and returns normalized, packed results to the CPU over 32-bit 
wide buses. Within the FPA, intermediate data is transmitted over two 34-bit wide buses. The data 
formats used by the FPA are compatible with these bus structures as well as the input and output formats 
of the various data manipulation units within the FPA. 


2.2.1 Floating-Point Numbers 

Floating-point numbers consist of a sign bit, exponent bits, and fraction bits. A single-precision floating- 
point number is stored in CPU memory as four contiguous bytes starting on an arbitrary byte boundary. 
Bits are labeled from the right 0 through 31. The number ts specified by its address A, the address of the 
byte containing bit 0 (Figure 2-1). The range of a single-precision floating-point number is approximately 
.29 x 10°38through 1.7 X 1038. The precision is typically seven decimal digits. 


A double-precision floating-point number is stored as eight contiguous bytes. Bit labeling and addressing is 
similar to a single-precision floating-point number. A double-precision number has a range similar to a 
single-precision number, but its precision is about 16 decimal digits (Figure 2-1). 


Floating-point numbers are transmitted to the FPA as packed, normalized numbers without a hidden or 
overflow bit. A single-precision number has 24 fraction bits and a double-precision number has 56 fraction 
bits. Hardware in the FPA inserts and handles both the hidden and overflow bits. The number is split apart 
and used in various data manipulation units in the FPA. Although all operations begin with normalized 
operands, the intermediate results produced by the FPA data manipulation units can vary widely. Subtrac- 
tion of nearly equal numbers can produce a number very close to 0. Addition and division can produce 
numbers close to 2. As a result, intermediate results are transferred between data manipulation units as 
unnormalized numbers with both hidden and overflow bits. After the result is normalized, it is ready to 
return to the CPU. When the result is transmitted, it is transmitted as a packed, binary normalized 
number without a hidden or overflow bit. 


POLY uses specialized floating-point notation for intermediate results. In POLY, 7 additional bits are used 
for fraction addition. POLY execution consists of multiply, add, multiply, and so on. To maintain 
maximum accuracy while functioning within the limitations of the FPA hardware, seven additional LSBs 
are transferred from the fraction multiply (FMH + FML) hardware to the fraction add hardware (FAD). 
The seven additional bits come from LSH <11:5> along FP bus A <14:08> into AR <06:00> (also called 
ARX). The FPA performs the add on the extended precision number, then transfers the addition result to 
the normalizer logic (FNM) where it is rounded, normalized, and held for the next part of the POLY 
instruction. 


SIGN FRACTION EXPONENT 


a 657 x 212 A NORMALIZED FLOATING 
POINT NUMBER. 


SIGN BIT FRACTION BITS EXPONENT BITS 


=z (EXCESS 80 NOTATION) COMPUTER REPRESENTATION. 


31 —— 0 


AS STORED IN VAX MEMORY, 
L O ORDER FRACTION Roo EXPONENT H! ORDER FRACTION TRANSFERRED TO FPA. AND 


RECEIVED BY FPA. 
1 SIGN 
8 2 31 16 15 14 0 


L. O. FRACTION | EXPONENT H. O. FRACTION AS TRANSFERRED ON FPA BUSES; 
FP BUS A + FP BUS B 


sveaesow HP (UNNORMALIZED, INTERMEDIATE 
HIDDEN RESULTS) 


SIGN 
AS USED IN FPA (UNPACKED; 
acclhal UNNORMALIZED RESULTS) 


Ed 2th 
x ees Aineeniel 
CS eee eee 


SIGN 
1 33 32 31 16 15 14 


READY FOR RETURN TO CPU 
Foyt L. O. FRACTION ae EXPONENT H. O. FRACTION (PACKED, NORMALIZED) 


Paes 
31 15 14 0 


L. O. FRACTION Tt) EXPONENT H. O. FRACTION RETURNED TO CPU 
NOTE 1: 


A NORMALIZED NUMBER HAS A 0 (ZERO) OVERFLOW BIT, AND A 1 HIDDEN BIT. 


TK-0528 


Figure 2-1 Floating-Point Format (Sheet | of 2) 


NOT USED 


NOT USED 


SIGN BIT 


| «FRACTION 


SIGN 


FRACTION BITS 


1XX Xe6aecev6e 


FRACTION 
.657 


EXPONENT 
x 214a— 


EXPONENT BITS 


(EXCESS 80 NOTATION) 


SIGN 
63 48 47 16 15 14 


[reaction [ reaction | [oe | react | 


LSB MSB 
1 SIGN 
33 32 31 1615 0 33 32 31 161514 76 


6) 
Lt {Fraction | Fraction | [| [Fraction | | exe | FRact | 


OVERFLOW HIDDEN 


a 
ar i ees 


ee} ele eo —— — FRACTION 


——_ 


MS's s 


33 32 31 


0 33 ne 31 


tf met ee ti et 
x 


31 ) 
FRACTION FRACTION 


LSB 


Figure 2-1 


a. 
16 ' 14 
[ —rancrion | | ee] react 


SIGN MSB 


NOTE 1: 
A NORMALIZED NUMBER HAS A O (ZERO) 
OVERFLOW BIT, AND A HIDDEN BIT. 


Floating-Point Format (Sheet 2 of 2) 


A NORMALIZED FLOATING POINT NUMBER 


COMPUTER REPRESENTATION 


AS STORED IN VAX MEMORY, TRANSFERRED TO 
FPA, AND RECEIVED BY FPA (TRANSFERRED IN 
TWO TRANSFERS; BITS 0-31 FIRST TRANSFER, 
BITS 32-63 SECOND TRANSFER) 


AS TRANSFERRED ON FP BUSES 
(UNNORMALIZED, INTERMEDIATE RESULTS). 
COMPLETE NUMBER (66 BITS 
TRANSFERRED SIMULTANEOUSLY) 


AS USED IN FPA (UNPACKED, UNNORMALIZED 
RESULTS) 


READY FOR RETURN TO CPU (PACKED, 
NORMALIZED) 


RETURNED TO CPU 1ST TRANSFER - 32 BITS 
(EXPONENT AND MOST SIGNIFICANT FRACTION 
BITS) 


2ND TRANSFER - 32 BITS 
(LEAST SIGNIFICANT FRACTION BITS) 
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The EMOD instruction causes a 32 X 24 (64 X 56 for double) bit fraction multiplication to be performed 
in the FMH and FML. The extra eight bits in the multiplicand are transferred over the ID bus to FP bus B 
line <07:00> to MCINT (also called MCX). MCINT <07:00> drives MCAND bus <07:00> for the 
fraction multiply. MPLIER is handled in the usual fashion. The result of the extended precision multiply is 
transferred to the CPU in one 32-bit transfer (F) or two 32-bit transfers (D). 


2.2.2 Integer Numbers 

The FPA handles a single-integer format instruction - MULL (multiply longword). A longword is stored 
in CPU memory as four contiguous bytes starting on an arbitrary byte boundary. The FPA receives two 
32-bit signed integers and multiplies them as unsigned integers to form a 64-bit product. The product, a 
64-bit number, is returned to the CPU in two 32-bit transfers (low half first) for further processing. See 
Figure 2-2 for a summary of the integer format. 


INTEGER (MULL) FORMAT 
31 0 
= AS STORED IN VAX MEMORY 
i MSB LSB TRANSFERRED TO FPA AND 
RECEIVED BY FPA. 
2's COMPLEMENT (SIGNED) NUMBER 


SIGN 
333231 ) 
ie MSB LSB AS TRANSFERRED ON FPA BUSES 
WH UNSIGNED (POSITIVE) NUMBER 
NOT 
USED 
31 03 O31 4 
MSB SALU AALU LSH REG LSB RESULT STORED IN FPA 
31 0 RESULT TO CPU (VIA 
FP BUS A TO DFMX BUS) 
LSB 
1st TRANSFER 
31 0 


* BITS 32 AND 33 OF FP BUS NOT USED 
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Figure 2-2 Integer Format 


2.2.3 Literals 

The FPA handles float and double-precision literal data. It receives the data from the CPU IB. Float literal 
data is transferred from the IB to the FPA’s literal register (LR) using the ID bus. The FPA then loads the 
LR data into FPA internal registers and begins processing. The first half of double-precision literal data is 
handled similarly. The second half comes from the CPU D-register via the ID bus and is loaded directly 
from the ID bus into the FPA internal registers. 


The FPA handles short literals. Short literals contain only six data bits and are part of the instruction. The 
CPU formats the six data bits within the 32-bit data longword based on instruction type. For an integer 
instruction (MULL is the only one the FPA handles), the six data bits are zero-extended (26 zeros are 
added.) Any integer between 0 and 63j9 can be written using a short literal. For a floating-point 
instruction, the short literal is assumed to contain three exponent bits and three fraction bits. The IB packs 
the data into standard FP format. This includes excess 80 notation for the exponent, a positive sign bit, and 
a normalized fraction with a one hidden bit that is not stored. Refer to Figure 2-3 for FPA short literal 
format and Table 2-1 for data that can be transferred using floating-point short literal form. Notice that 
only positive numbers can be transferred. If a double-precision short literal is specified, the FPA accepts 
the first half and manufactures zeros to fill the second half. 


5 3 2 0 


EXPONENT] FRACTION 


A. SHORT LITERAL DATA; AS STORED IN INSTRUCTION STREAM 


151413 10 


9 4 3 0 
ZEROS fe ZEROS DATA ZEROS 


B. SHORT LITERAL DATA: AS FORMATTED BY IB AND 
TRANSFERRED TO FPA FOR A FLOATING-POINT OPERATION 
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Figure 2-3 Short Literal Format 


Table 2-1 Floating Literals 


Exponent Fraction 

0 l 2 3 4 5 6 7 
0 1/2 9/16 5/8 11/16 3/4 13/16 7/8 15/16 
l l 1-1/8 1-1/4 1-3/8 1-1 / 1-5/8 1-3/4 1-7/8 
2 Z 2-1/4 2-1/2 2-3/4 3 3-1/4 3-1/2 3-3/4 
3 4 4-1/2 5 5-1/2 6 6-1/2 7 7-1/2 
4 8 9 10 1] 12 13 14 15 
5 16 18 20 22 24 26 28 30 
6 32 36 40 44 48 52 56 60 
7 64 72 80 88 96 104 112 120 


The FPA also handles long literals (32 or 64 data bits.) Thirty-two bits, either a complete single-precision 
transfer or the first half of a double-precision, are transferred from the IB to the FPA LR. The second half 
of the double-precision number is taken directly from the ID bus. Float and double-precision floating-point 
data can be transferred using long literal format. The FPA also receives 32-bit integer data using the long 
literal format. (The FPA does not handle any 64-bit integer operands.) 


2.2.4 Zero and Reserved Operand Codes 

The FPA checks all data received for zeros and reserved operands during the fraction processing. Both 
zero and reserved operand function as codes transmitting special information. As discussed in Section 1.5, 
the FPA assumes all floating-point numbers to be normalized numbers (between 1/2 and 1) with a hidden 
bit that is not stored. The hidden bit is normally inserted by data manipulation hardware. A zero cannot be 
represented as a normalized number, and the hardware that inserts the hidden bit only increases the 
problem of representing and using zero. As a result, zero is represented by a code with zeros in the 
exponent bits (no excess 200 notation) and a clear sign bit. The fraction bits do not matter. Whenever this 
combination of bits is sensed, the FPA accesses a special microcode that simulates the special properties of 
addition, subtraction, multiplication, and division with zero. See Table 2-2 for the result of an operation 
with zero and Figure 2-4 for the zero code. 


Table 2-2 Zero Operand Microcode 


Operation Operands) Operation Result 


Add O+X, X+0 X operand returned 
0+0 Zero returned* 


Subtract O-X -X returned 
X-0 X operand returned 
0-0 Zero returned 


Multiply 0x0, XX0,0xXX Zero returned* 
Divide O+X (dividend is zero) Zero returned* 


X +0 (divisor is zero: 
divide by zero) Error conditiont 


* Zero code Is returned, 0 in sign and exponent. 


+ FPA informs CPU that division by zero was attempted by asserting FPA error and PSL V bit and 
not asserting FP SYNC. 


The code for reserved operand is zeros (cleared) in the exponent bits and a one (set) in the sign bit. One in 
the sign bit normally indicates a minus number, so this sometimes is called minus zero. A reserved operand 
indicates invalid data. It indicates either data was accessed from a location that had no data loaded into it, 
or a previous exception. See Figure 2-4 for the reserved operand code. 
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ZERO CODE 


31 16 15 14 7 6 0 
DON’T CARE 0 ZERO DON'T CARE 
FRACTION SIGN EXPONENT FRACTION 


RESERVED OPERAND CODE 


31 16 1514 7 6 ) 
DON’T CARE | ZERO DON’T CARE 
FRACTION SIGN EXPONENT FRACTION 
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Figure 2-4 Zero and Reserved Operand Code 


2.2.5 Hidden, Overflow, and Guard Bits 

The FPA uses extra fraction data bits during fraction manipulation to completely represent the fraction 
data, to handle result overflow, and to ensure accuracy of fraction result. See Figure 2-5 for location of 
hidden, overflow, and guard bits. 


USED BY FPA 
ADDED | | 
By 94 DATA FROM CPU 


cs 
FPA FP BUS 
33 32131 16 1514 ol<— LINES 
i FRACTION i EXPONENT FRACTION 


OVERFLOW SIGN WHERE GUARD 
BITS ARE 
HIDDEN TRANSFERRED 
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Figure 2-5 Hidden, Overflow, and Guard Bits 


As discussed previously, the CPU stores floating-point numbers in a packed normalized form with the 
MSB of the fraction (called the hidden bit) not stored (since it is always a 1). The FPA receives the 
floating-point numbers in this form. To facilitate fraction calculation, the logic on FNM adds the hidden 
bit to all CPU fraction data as it is transported over the FP buses. The hidden bit is transmitted on FP bus 
<32>. This means that all fraction data received by FPA fraction manipulation units have correct hidden 
bits. 


The FPA also transmits an overflow bit between fraction manipulation units using FP bus <33>. The 
overflow bit handles unnormalized, intermediate fraction results. The combination (addition, subtraction, 
or division) of two normalized fractions can create a result greater than 1. The overflow bit enables the 
FPA to transmit this unnormalized result from the fraction computation units to the fraction normalizer 
logic (FNM). 


To ensure accuracy of fractional results, the FPA data manipulation units add seven zeros, called guard 
bits, to the low-order end of the fraction data received. This means a float fraction is 32-bits wide; a double 
64-bits wide. The POLY instruction loads extra data bits rather than zeros at the low-order end of each 
coefficient fraction. The instruction also transfers additional low-order data bits from the fraction multiply 
logic to the fraction add logic. These guard bits are dropped each time the POLY accumulation is 
normalized and rounded but ensure that the final answer is accurate. The guard bits are necessary for 
right-shifting an FP fraction to align radix points for addition and subtraction or to normalize the result. 
Without the guard bits the least significant bits off the right end of the shifted fraction would be lost. In 
some cases this loss would cause the last bit of the normalized result to be wrong. The guard bits prevent 
this. Guard bits are transmitted between FP data manipulation units using FP bus A <14:08>. These lines 
normally transmit exponent data. This arrangement allows the FPA to maximize accuracy without 
additional hardware overhead. 


2.2.6 Overflow, Underflow, Zero, and Reserved Operands 

The FPA monitors all operands and results for exceptional conditions. When the FPA senses one or more 
of these conditions it informs the CPU via various bits and combinations of bits. Either one or both units 
begin special operations designed to minimize the effect of the condition. In some cases it stops the FPA’s 
current operation and returns the FPA to the IRD state where all logic and registers are cleared in 
anticipation of a new FP instruction. The following sections discuss these various unusual conditions. Table 
2-3 summarizes the FPA and CPU operations caused by the unusual conditions. 


2.2.6.1 Overflow and Underflow —- The FPA can handle a very large, but bounded, range of numbers. 
Numbers too large (overflow) or too small (underflow) cannot be accurately handled (see Figure 2-6). 
Special hardware monitors the results of all FPA operations for overflow and underflow conditions. The 
FPA checks for overflow and underflow by monitoring the exponent results. The monitoring is straightfor- 
ward because of the excess 80 notation used. If the exponent with its excess 80 bias exceeds FFj6, an 
overflow has occurred. If the exponent is less than 0, an underflow has occurred. 


If an overflow condition is sensed, the overflowed number is useless. The FPA manufactures a reserved 
operand and informs the CPU that an overflow occurred. The CPU notes the overflow and stores the 
reserved operand. The FPA then returns to IRD. 


Underflow is not as serious a problem. It merely indicates that the number is so small and so close to zero 
that the FPA cannot accurately represent it. If an underflow occurs, the FPA sets the underflowed number 
to zero and informs the CPU that an underflow has occurred by asserting both FP SYNC and ERR SYN. 
It is important to inform the CPU that a zero has been returned because the CPU may at some later time 
attempt a division by the result (division by zero results in an error). 


2.2.6.2 Zero — If a zero code is encountered in an operand transmitted to the FPA from the CPU, FPA 
microcode simulates the special properties of addition, subtraction, multiplication, and division with zero. 
Refer to Table 2-2 for the result of an operation with zero. If an exact zero is generated as a result of an 
FPA operation, the zero code is returned to the CPU and the condition code bits are set for a zero result. 
Zero can be generated in a normal arithmetic add or subtract operation (equal or equal-opposite operands), 
or in a microcode simulated arithmetic operation with a zero operand. An operation that generates an 
exact zero does not assert ERR SYN as does an underflow operation (although both return a zero code). 


2.2.6.3 Reserved Operand — Refer to Table 2-3 for the condition codes returned to the CPU when a 
reserved operand is encountered by the FPA. 


Table 2-3. Exception Conditions 


Exceptions Encountered 
Op Code Zero Operand Reserved Operand Result 


ADD, Microcode simulates FPSYNC (ACCO) clear All operations handle the 
SUBT, arithmetic operation ERRSYNC (ACC1) set occurrence of zero, underflow, 
MULT, with zero (Table 2-2). | CPU traps FPA to IRD and overflow results similarly.* 
EMOD 
DIVIDE ZERO DIVIDEND — FPSYNC (ACCO) clear ZERO — The zero code and 
Microcode returns ERRSYNC (ACC1) set FPSYNC are sent. PSL Z bit 
zero as result PSL V bit clear is set. 
ZERO DIVISOR — UNDERFLOW — Zero code, 
Divide by zero FPSYNC, and ERRSYNC are 
ERROR — FPSYNC sent. PSL Z is set. If PSL U 
(ACCO) clear (underflow) is set underflow 
ERRSYNC (ACC1) set causes a trap, otherwise 
PSL V bit set operations continue. 


CPU differentiates between ZERO DIVISOR and OVERFLOW ~— Reserved 


RESERVED OPERAND by examining PSL V code, FPSYNC, and ERR 
bit. In both cases, CPU traps FPA to IRD. SYNC are sent. PSL V is set. 
CPU traps FPA to IRD. 
POLY* POLY microcode FPSYNC (ACCO) set 
simulates POLY ERRSYNC (ACC1) set 
operations with zero. In STATUS REGISTER, 
minus ZERO ERROR 
bit set. 
CPU checks argument = 
RESERVED OPERAND. 
FPA checks coefficient 
= RESERVED 
OPERAND. 
MULL No checking of MULL operands or results is performed by FPA software or 


hardware. Any combination of bits can be interpreted as an acceptable integer. 


* When POLY flows note a RESERVED OPERAND, UNDERFLOW, or OVERFLOW, both FPSYNC (ACC0) 
and ERRSYNC (ACC1) are set. CPU examines PSL and FPA STATUS REGISTER to determine exception 
condition. RESERVED OPERAND sets the MINUS ZERO ERROR bit. OVERFLOW sets the PSL V bit. 
UNDERFLOW sets PSL Z bit. 


OVERFLOW. —.111X27F —.1X2780 UNDERFLOW Be oa 111 X2’" OVERFLOW 


RANGE RANGE * RANGE 
+. >——$$____________—t» nee 


~ _17X 1038 ~—=29 x 1038 ~.29 x 1038 ~ 1.7 X 1038 


MOST 
NEGATIVE 
NUMBER 


ZERO 


SMALLEST SMALLEST 
NEG. NUM. POS. NUM. 


* EXACT ZERO DOES NOT CAUSE UNDERFLOW 
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Figure 2-6 Overflow and Underflow Ranges 


2.3 INSTRUCTIONS AND ALGORITHMS 

This section concentrates on the microcontrol used to carry out each FPA instruction. Each instruction 
accesses different microcontrol addresses to correctly move and load operands, compute intermediate 
results, and ready the final result for return to the CPU. Special instructions check for and handle errors 
and exceptional conditions. 


This section details the data flow within hardware required to carry out the selected instruction. It only 
summarizes the hardware actions started once the data has been loaded by the microcontrol. Section 3.2 
contains a complete and detailed description of the hardware in each FPA section. Sections 2.3 and 3.2 
complement each other and both should be read to thoroughly understand the hardware implementation of 
each FPA instruction. 


As stated before, this section concentrates on data flow. Figure 2-7 shows the data bus interconnections 
and the various registers in the FPA. Although this figure is not specifically referenced in the discussion, it 
helps in understanding the data flow and should be referred to frequently. 


During IRD (instruction decode) the FPA performs some operations that are prerequisites to many FPA 
instructions. The FPA assumes a register-to-register (R-R) float instruction and begins FPA register 
loading. The FPA has two copies of the CPU general registers. During IRD, FPA receives specifier 
information from the IB and accesses the register addresses contained. The contents of the first specifier 
are placed on FPA bus A; the contents of the second on bus B. 


The data on bus A is loaded in ARI, LA, SA, MC1, and MPO; bus B loads BR1, LB, SB, MP1, and MCI. 
ARI and BRI are fraction registers used for the addition and subtraction of floating-point numbers. LA 
and LB are loaded with the exponents of the numbers and immediately the hardware begins an exponent 
difference calculation. (It is necessary to know the difference between the exponents and to identify the 
larger exponent for floating-point additions, subtractions, and multiplications.) SA and SB are input 
registers for the sign-processing hardware. Fraction data from specifier 1 (on bus A) is loaded into multiply 
registers, MC1 (multiplicand), and MPO (multiplier). Fraction data from specifier 2 (on bus B) is loaded 
into MP1 (multiplier) and MCI (multiplicand-integer). MC1 and MP1 hold operand data for MULF and 
EMODPF instructions. The hardware multiply begins the MULF or EMOD fraction multiply operation 
during IRD using MC1 and MP1. MCI and MPO contain the operand for a MULL instruction. 
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Figure 2-7 FPA Block Diagram 


During IRD, numerous FPA instructions are started. If the instruction is a float R-R, both operands are 
already loaded and ready in the FPA. Exponent manipulations needed for add, subtract, and multiply 
operations are started. MULF and EMODF fraction multiplications are also started. If the instruction 
decoded is a MULL, the multiplier and multiplicand are already loaded into the proper registers. 


2.3.1 Add/Subtract 
The FPA add/subtract operations can be broken into three states: load, add/subtract, and normalize. 


2.3.1.1 Load — While the FPA is in IRD, it sets up for a float R-R operation. This means that specifiers 
1 and 2 from the instruction buffer are being placed on FP buses A and B, respectively. Bus A loads AR1 
(fraction register), LA (exponent register), and SA (sign latch). Bus B loads BR1, LB, and SB. 


When the FPA decodes a floating-point instruction, it enters A-Fork and selects a microword address 
based on op code and specifier types. If the instruction is a float R-R add/subtract, the FPA immediately 
enters the optimized add/subtract execution state. If, however, it is not, the FPA receives and stores the 
required data during A-Fork and possibly B-Fork flows under control of the selected microword. If it is 
double-precision, 32 additional fraction bits are loaded into both ARO (extension of AR1) and BRO 
(extension of BR1.) If it is not an R-R operation, the new data from the correct source is loaded into ARI, 
LA, SA, BR1, LB, and SB. 


As the final correct operands are loaded, whether during IRD (in the case of float R-R operations) or 
during some following microcontrol state in A-Fork or B-Fork, the exponent difference of the two 
operands is determined by comparing LA and LB in DALU and CALU. Based on the exponent difference, 
the fraction associated with the smaller exponent is loaded into SHMX and right-shifted by ASHR until 
the radix points align. 


2.3.1.2 Add/Subtract - The fractional result is computed in this state. 
The FALU operation is selected based on the op codes, signs of the operands, and exponent difference. 


Normally, the FALU adds or subtracts the already aligned fractions for the fractional result. Refer to 
Table 2-4 for normal FALU operation and Table 2-5 for special FAD operation criterion. 


Table 2-4 FALU Operation 


Operand Sign FALU Operation 


ADD Add 
ADD Subtract 
SUBT Subtract 


Add 


Table 2-5 Combination of Conditions Initializing Special FAD Operation 


Exponent Diff Op Code 


Greater than 7 4 
Greater than 1 POLY 
Less than 2 


FALU Subtract 


Precision 


Yes 


X = Don’t care 
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The special FAD operation is used to ensure maximum accuracy in the result while operating within the 
FPA hardware constraints. The special FAD operation involves the following. 


1. Complementing the fraction associated with the smaller exponent by subtracting the fraction 
from zero in the FAD. 


2. Returning the complemented number to the fraction register it was in originally (either AR or 
BR). 


3. Loading the number into SHFMX and right-shifting and sign-extending based on exponent 
difference until the radix points align. 


This special operation takes an extra microstep but ensures maximum accuracy. As a result, the actual 
fraction subtraction to produce the result does not take place until this third state. 


During the add/subtract state, the larger exponent is transferred to the PR. 


2.3.1.3, Normalize — In this state, the answer is readied for return to the main machine. This involves 
final normalization of the fraction, adjustment of the exponent, and determination of the resultant sign. If 
the calculation involved special FAD operations as discussed in the previous section, the fraction subtrac- 
tion will first be carried out and then the result will be readied for return to the main machine. 


When entering the normalization flows, the FPA checks three conditions. 


1. | Exponents equal zero. 
2. _FALU subtract with exponent difference less than two. 
3. Subtract, exponent difference less than seven, and DP. 


If a zero operand is noted, the other (non-zero) operand is transferred to the output. If the (non-zero) 
operand is the subtrahend in a FALU subtraction, the sign is complemented. (minuend-subtrahend = 
remainder; 0-X = —X.) A FALU subtraction with exponent difference of 1 or 0 initiates special flows 
because the subtraction of two nearly equal numbers can result in a very small fraction (numerous leading 
zeros) which might require many shifts before the first significant bit is located. The special flow initiated 
can shift the result up to 60 places to find the first significant bit before it is transferred to the standard 
normalize routine. If a first significant bit is not found after 60 bits have been shifted, a zero is readied as a 
result. If the third branch is taken, the addition state described in Section 2.3.1.2 results; flow then 
reenters the normalization routine. 


Usually the unnormalized result requires a shift of four places or less. If this 1s the case, the four MSBs are 
examined to locate the first significant bit. Based on the location of the first significant bit, a rounding byte 
is added to the fraction. If the result from a FALU subtraction is negative, the FALU result is subtracted 
from the rounding byte to return the number to sign magnitude notation and round it in a single step. Once 
the FALU result is added to or subtracted from the rounding byte, the fraction is shifted and the least 
significant bits are dropped. 


In all cases, the number of shifts required to ready the fraction for return to the CPU is computed and is 
used to adjust the exponent in the PR. Once completed, the exponent, the normalized fraction, and the 
sign of the result are placed on the FP bus A. When the complete result is on the bus, standard routines 
handle the actual transfer to the main machine. 
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2.3.2 Multiply (Floating-Point) 
The FPA multiply operation can be broken into three states: load, multiply, and normalize. In the process 
of carrying out an FP multiply, the FPA performs the following. 


1. Receives the operands (each consisting of an exponent, fraction, and sign bits). 
2. Checks for zeros and reserved operands. 

3. Loads the exponent, fraction, and sign bits into the appropriate registers. 

4. Starts the hardware to carry out the required calculations. 


5: Assembles and readies the result for return to the CPU when notified that the hardware 
calculation is finished. 


2.3.2.1 Load — To maximize speed, the FPA is continuously setting up for a float R-R operation. This 
means that in IRD, specifiers 1 and 2 from the instruction buffer are addressing the general-purpose 
registers (GPRs) in the CPU, and the register data is being placed on FP buses A and B, respectively. Bus 
A loads MC1 (multiplicand register), LA (exponent register), and SA (sign latch). Bus B loads MP1 
(multiplier register), LB, and SB. 


When the FPA decodes a floating-point instruction, it enters A-Fork and branches to a specific microword 
based on op code and specifier types. If the instruction is a float R-R multiply, the operands are already 
loaded and the FPA immediately enters the multiply state. If it is not a float R-R multiply, the FPA, 
under control of the selected microword, receives and stores the required data during A-Fork and possibly 
B-Fork flows. If it is a double-precision multiply, 32 additional fraction bits are loaded into both MCO 
(extension of MC1) and MPO (extension of MP1). If one or both of the specifiers are not registers, all new 
data is loaded into MC1, LA, SA, MP1, LB, and SB. 


As the final correct operands are loaded, whether during IRD (in the case of float R-R operations) or 
during some following microcontrol state, the fraction multiplier begins the fraction multiply by breaking 
the fractions into nibbles and beginning the hardware multiplication using the first multiplier nibble. 


2.3.2.2 Multiply — In the multiply state, the fraction multiplication continues until a final fraction (as yet 
unnormalized) is computed, the exponents are added, and the sign of the result is computed. The fraction 
multiplication is initiated when the multiply flows issue MCONT (multiply continue). 


As MCONT is issued, the FPA checks for operands equal to zero or minus zero (reserved operand). If a 
zero operand is found, computation stops and the FPA immediately returns a zero to the base machine. If 
a reserved operand is found, the operation aborts. If neither is found, computation continues. In the case of 
a float (single-precision) multiply, the fraction multiplication is completed as the exponent calculation is 
completed. The product 1s transferred to the NR. In a double-precision multiply, the microcontrol enters a 
wait state. While waiting during a double-precision multiply, the FPA continually transfers the output of 
the fraction multiplier to the normalizer. This enables the FPA to begin normalizing the fraction result as 
soon as the multiplication is complete. The FPA remains in the wait state until a hardware counter in the 
fraction multiply logic asserts MUL/DIV DONE indicating the fraction multiply is complete. 


While the fraction multiply and the check for zeros and reserved operands is taking place, the exponents 
are added. If no zeros or reserved operands are found, the fraction multiply and exponent processing 
continues. After the exponents are added, a bias of 200g or 8016 is subtracted from the exponent result to 
return the exponent to excess 80 notation (refer to Section 1.6). 


In a multiply operation, the sign of the result is the exclusive-OR of the operand signs. 
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By the time the fraction multiply is complete, the exponents have been added, exponent bias subtracted, 
and the sign of the result calculated. The result of the fraction multiply is moved to NR. 


2.3.2.3 Normalize - The normalize state of a floating-point multiply is very simple. Since the input 
operands are always between 1/2 and 1, the result is always between 1/4 and 1. This means that the result 
can be normalized with a single shift of four bits or less. In the normalize state, the fraction is rounded and 
shifted and the exponent is adjusted to reflect the normalization shift. The normalized fraction, adjusted 
exponent, and sign bit are placed on the FP bus A. Once the complete result is on the bus, standard 
routines handle the actual data transfers to the main machine. 


2.3.3 MULL (Multiply Integer Longword) 

The FPA’s MULL algorithm is the simplest and most straightforward of all the operation flows. The FPA 
receives two 32-bit signed integers, performs an unsigned multiplication, and returns the 64-bit answer to 
the base machine. The FPA performs no result normalization or checks for reserved operands, zero 
operands, or other error conditions. Microcode in the base machine generates the condition codes and 
handles all the checks and manipulations required to ensure a correct result. 


2.3.3.1 Load — As discussed in Section 2.3, the FPA during IRD loads MPO and MCI (the two registers 
used in MULL operations) with the register contents of specifier 1 and 2, respectively. If the instruction 
decoded in the A-Fork flows is an R-R MULL, the FPA can begin the multiply immediately. If the 
instruction is a MULL but not an R-R, the FPA will, under the control of the selected microaddress, load 
data from the correct source into either or both MPO and MCI. 


2.3.3.2 Multiply and Return - The decoding of a MULL causes the fraction multiply hardware to 
abandon setup of a MULF and begin accessing the registers used for MULL (MCI and MPO). When the 
proper data has been loaded, MCONT is issued by the FPA. This indicates to the fraction multiply 
hardware that the correct data is in MPO and MCI, and that the data accesses started previously were 
accessing correct data. 
MCONT enables the fraction multiply hardware to continue multiplying. The multiply continues, con- 
trolled by a hardware sequencer within fraction multiply hardware, while the FPA waits two machine 
cycles. The answer accumulates in ACCM and LSH. After two wait cycles, the multiply is finished. The 
hardware stops and the FPA makes the 32 low-order bits (from LSH) available to the CPU. When the 
CPU responds with CPSYNC, indicating the low-order bits have been stored, the FPA readies the high 32 
bits from SALU for transmission to the CPU. 
2.3.4 Divide 
The FPA divide operation can be broken into three states: load, divide, and normalize. To do a floating- 
point divide, the FPA performs the following. 

1. Receives the operands (each consisting of sign, fraction, and exponent bits). 

2. Loads the operands into holding registers. 

3. Transfers the operands from the holding registers into the correct division registers. 

4. Starts the hardware to perform the fraction division. 

5. | Checks for zero and reserved operands. 


6. Starts the hardware to store the result. 


7. | Normalizes and packs the result for return to the CPU. 
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2.3.4.1 Load — The loading of division operands takes place in two substeps: data fetch and division 
register load. Unlike the FPA add/subtract, multiply, and MULL operations, the FPA does not load 
division operands into the proper division registers during IRD (Table 2-6). 


Table 2-6 The Division Load 


Specifier 1 Specifier 2 


IRD Register and float assumed (divi- Register and float assumed 
sor) Register data to ARI, LA, SB (dividend). Register data to 
BRI, LB, SB 


Data Fetch Substep Op code decoded, specifiers and 
precision known 


New data loaded into ARI and New data loaded into BR1 and 
ARO*, LA, and SA, if needed. BRO*, LB and SB, if needed. 


Division Register Load Ist Microword - move LA (divisor Move BR (dividend fraction) to 
Substep 2 microwords exponent) to XR. NR. 


2nd Microword - move AR (divi- Move NR (dividend fraction) 
sor fraction) to just vacated NR. to RR and right shifts the just 
loaded dividend fraction to 
compensate for RR’s hard wired 
left shift. This right shift ensures 
initial dividend is properly 
represented. 


Subtract XR (divisor exponent) 
from LB (divident exponent). 


*ARO and BRO are fraction extension registers for double precision operations. 


During IRD an R-R float operand is assumed. This means that both specifier 1 and 2 are assumed to be 
registers. The contents of the first register named is placed in AR, LA, and SA. The content of the second 
register named is placed in BR, LB, and SB. If the operation decode is an R-R float divide, the data fetch 
substep is performed and division register load may begin. 


However, if the operation decode is not an R-R float, divide microcode waits for data from the correct 
specifier and loads it into either AR1, LA, and SA; or BR, LB, and SB; or both. When the divisor is in 
AR, LA, and SA, and the dividend is in BR, LB, and SB, the data fetch substep is finished. 


The division register load substep loads the divisor’s and the dividend’s fraction and exponent components 
into the registers required to perform a division. The loading of the proper registers takes two microcode 
steps. The first microcode step loads the divisor exponent into XR and loads the dividend fraction into the 
NR. The second microcode step finishes the register loading by moving the dividend fraction (in the NR) 
to the RR, and loading the just vacated NR with the divisor fraction from the AR. The second microcode 
step also starts the fraction division hardware, checks for zeros and reserved operands, and subtracts the 
divisor exponent (XR) from the dividend exponent (LB) (LB-XR). 
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2.3.4.2 Divide — The divide operation continues unless a zero or reserved operand is found. If a zero 
dividend is found, operations cease and a zero is readied for return to the CPU. Finding a zero divisor or a 
reserved operand initiates error states. The FPA remains in these error states until returned to IRD by a 
CPU signal. 


If no zeros or reserved operands are found, the division continues. A bias 80 is added to the result of the 
exponent subtraction to return it to excess 80 notation (Section 1.6). The fraction multiply hardware is 
started and is used to store the result of the fraction division as it is generated. The division continues 
under hardware control as the FPA microcode remains in a divide wait loop. 


The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initially loaded 
into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracted from the 
dividend (contents of RR). If the result is negative, a 0 is left-shifted into the result register in the fraction 
multiply hardware, and the contents of the RR is left-shifted by one. If the result is positive or zero, a 1 is 
left-shifted into the result register, and the result is loaded into the remainder register left-shifted by one. 
The divisor (contents of NR) is continually subtracted from the contents of the RR until 26 bits (58 bits 
for double-precision) of quotient are generated. MUL/DIV DONE is now asserted. 


Asserting MUL/DIV DONE stops the division and ends the divide wait loop. The divide result is 
transferred from the fraction multiply hardware where it was stored during generation to the normalize 
register (NR) in the normalize hardware. 


2.3.4.3 Normalize — Since the two initial operands are normalized (between 1/2 and 1), the result is 
always positive and between 1/2 and 2. This means the normalize and round operation is simple and takes 
only one microstep. The result is examined, a round byte is selected and added, and the data is shifted as 
needed to produce a normalized result. The exponent result is adjusted to reflect the direction and amount 
of the fraction shift. The normalized fraction, adjusted exponent, and sign bit are placed on the FP bus(es). 
Once the result is on the bus(es), standard storage routines handle the actual transfer to the CPU. 


2.3.5 EMOD (Extended Precision Multiply and Integerize) 

The EMOD operation is partially done in the FPA. The FPA performs an unsigned 32 < 24-bit (64 x 56- 
bit for double-precision) multiplication and returns the fraction result to the main machine. The main 
machine does all further processing. The FPA EMOD operation can be broken into two states: operand 
load and result calculation and return. 


2.3.5.1 Operand Load — Loading the EMOD operands involves loading the multiplicand, an 8-bit 
multiplicand extension, and the multiplier into proper registers. The multiplicand (either single- or double- 
precision) is loaded into MC during A-Fork. In B-Fork, EMOD flows are started. These flows wait for the 
CPU to fetch the multiplicand extension (8 bits) and transmit it to the FPA via the ID bus. The FPA loads 
the extension into MCX which is part of the MCI register. The second operand is then transmitted to the 
FPA and loaded into the appropriate multiplier register MPO and MP1. The multiplier is not extended. 
The FPA receives and stores the exponent and sign associated with both operands but does not use them. 


2.3.5.2 Result Calculation and Return — Once the operands are loaded, MCONT is asserted and the 
EMOD multiply begins. The operands are tested for zeros or reserved operands. If zeros are found, special 
flows stop the multiply and return a zero to the CPU. Finding reserved operands initiates error flows. If no 
exceptions are found, the multiply sequencer, started by MCONT asserted, continues multiplying. A 
single-precision (float) multiply is finished in one microstep after the exponent test. A double-precision 
multiply causes the FPA to enter a wait loop. It remains in the wait loop until the multiply sequencer 
asserts MUL/DIV DONE indicating the result is computed. 


When the result computation is finished, the fraction (32-bit float, 64-bit double) is transmitted to the 
CPU. The CPU does all further processing including sign computation, removal of the integer part, 
normalization, and exponent calculation. 
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2.3.6 POLY (Polynomial Evaluation) 
POLY is an FPA-implemented instruction. The FPA does the majority of calculations required to evaluate 
a polynomial expression. This involves: 


1. Storing a constant and an accumulation, 
2. Receiving coefficients, 


3. Repeating additions and multiplications using the constant, the accumulation, and the new 
coefficient, and 


4. Readying a final result to be returned to the CPU. 


It also uses specialized operations (both hardware and microcode) to ensure maximum accuracy within the 
FPA’s hardware limits. 


The following sections explain POLY flows, polynomial expressions, and POLY exceptions in detail, and 
define various terms. Also discussed are the numerous flows required to handle errors, underflows, 
overflows, and zeros. 


2.3.6.1 The Polynomial Expression — The generalized polynomial is written: 


f(x) = ag + ayx + anx2 +... + apx". 


The x, aconstant within each polynomial, is called the argument and is raised to various powers: x!, 
x2, x3, ..., x9. The highest power represented here by n superscript is called the degree of the equation. 
The ao, a], a2, ..., an are the coefficients. Rearrangement and factoring produces f(x) = ap + x(aj +x 
(ag +...+ x(an_1+ xa, ))). The result, f(x), may be computed: ay times x then add ap- : the resultant 


answer times x and thenadd ap,_2... The generalized form is: (accumulation times x) plus the new 
coefficient, aj, equals the new accumulation. 


The POLY instruction formatis POLY argument. degree, coefficients table. The FPA receives and 
stores theargument. |The CPU uses the degree operand to determine when it has accessed the last 
coefficient of the table so it may inform the FPA thatthe POLY calculationis done. The coefficient 
table is arranged in ap, an_}, an-2, .... 4], and ag order. The CPU transmits the coefficients to the 
FPA as needed: ay first, ap_; next, ... 


2.3.6.2 Normal POLY Flows — The FPA begins special POLY flows in B-Fork. The POLY argument is 
transferred to the FPA during A-Fork and then loaded into the argument registers. The argument fraction 
is loaded into MP, the exponent in XR, and the sign in SX. The argument remains in these registers 
throughout POLY execution. The FPA waits for the first coefficient to be sent so the POLY computation 
can begin. 


POLY computation can be divided into three large categories. 
1. Argument and first coefficient handler 


2. Generalized POLY computation (neither first term nor last term) 
3. POLY DONE handler (handles Ao, the last coefficient). 


This section discusses the flow by these three categories. Within each category microcode controls the 
normal operations, checks for exceptional conditions, and attempts to recover from any exceptional 
conditions. Refer to Figure 2-8 for a summary of the POLY flow. 
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61-C 


POLY BEGINS WITH 
ARGUMENT IN 
AR, LA, AND SA 


FIRST COEFFICIENT HANDLER 

*MOVE ARGUMENT TO REGISTERS 
MP<AR ARGUMENT FRACTION 
XR «LA ARGUMENT EXPONENT 
SX«+SA ARGUMENT SIGN 


* IF ARGUMENT IS ZERO, FLOW REMAINS IN THIS HANDLER WAITING FOR 
LAST COEFFICIENT WHICH WILL BE FLAGGED BY POLY DONE 


*WAIT FOR FIRST COEFFICIENT 
“MOVE COEFFICIENT TO REGISTERS POLY 
COEFFICIENT FRACTION DONE 


MC,BR < A(N) 

LB —A(N) COEFFICIENT EXPONENT 

SB —A\(N) COEFFICIENT SIGN 

SA <—SB TRANSFER COEFFICIENT SIGN 
*MULTIPLY COEFFICIENT AND ARGUMENT FORMING MULT.RESULT 


MULTIPLY FRACTIONS 
LA,PR — XR+LB-128 ADD & ADJUST EXPONENTS 
SA — SA.XOR.SX COMPUTE SIGN 
*IF OVERFLOW/UNDERFLOW ENTER GENERAL POLY FLOWS ATTEMPTING 
A RECOVERY 


NORMAL OVERFLOW/UNDERFLOW 
| | ENTRY ENTRY 


ee POLY 


GENERAL POLY FLOWS (NO POLY DONE) DONE 
“WAIT FOR COEFFICIENT 
“MOVE COEFFICIENT TO REGISTERS 


AR —MP*MC 


BB ~—A\l) COEFFICIENT FRACTION 
LB +Ail) COEFFICIENT EXPONENT 
SB + Ail) COEFFICIENT SIGN 


*ADD COEFFICIENT AND MULT. RESULT FORMING ACCUMULATION 


NB «~AR+BR ADD FRACTIONS 

PR «— MAX(LA,LB) SELECT EXPONENT 

MC +-NR NORMALIZED 

PR«PR NORMALIZED 

SA+<SR SIGN OF ACCUMULATION 


“IF OVERFLOW, ERROR 

“IF UNDERFLOW ACCUMULATION IS SET TO ZERO 

*MULTIPLY ACCUMULATION AND ARGUMENT FORMING MULT.RESULT 
AR~MP*MC ARGUMENT * ACCUMULATION 
PR — PR+XR-128 ADD & ADJUST EXPONENTS 
SA < SA.XOR.SX COMPUTE SIGN 

“IF OVERFLOW/UNDERFLOW, CONTINUE GENERAL POLY FLOWS 

ATTEMPTING A RECOVERY 


LAST COEFFICIENT HANDLER 
(POLY DONE ASSERTED AND ARGUMENT OR DEGREE = 0) 
ANSWER IS JUST LAST COEFFICIENT 

“READY COEFFICIENT FOR RETURN 


PR+LB TRANSFER EXPONENT 
NR<-—BR TRANSFER FRACTION 
SA+SB TRANSFER SIGN 

*GO TO REGULAR STORE FLOWS 
NSHF<-NR TRANSFER FRACTION 


ASSERT FPSYNC INDICATING ANSWER IS READY 


LAST COEFFICIENT HANDLER (POLY DONE ASSERTED) 
“WAIT FOR COEFFICIENT 
“MOVE COEFFICIENT TO REGISTERS 


BR <A\(I) COEFFICIENT FRACTION 
LB —A(l) COEFFICIENT EXPONENT 
SB +All) COEFFICIENT SIGN 


"ADD COEFFICIENT AND MULT.RESULT FORMING ACCUMULATION 
NR-—AR+BR ADD FRACTIONS 


PR + MAX(LA,LB) SELECT EXPONENT 
“IF OVERFLOW, ERROR 
*GO TO REGULAR NORMALIZE FLOWS 
NSHF -—NR NORMAL FRACTION 
PR «PR ADJUST EXPONENT 
SA+SR SIGN OF RESULT 


ASSERT FPSYNC INDICATING ANSWER IS READY 
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Figure 2-8 The POLY Flow 


Within the flows, different microcode instructions handle float and double-precision operation. In POLY 
double, the coefficient, argument, and accumulation fractions each use an additional 32 low-order bits. 
The differences between float and double-precision are not discussed in each operation because it is 
normally limited to longer fraction multiply times and slower fraction transfers. These come about because 
there are more bits to be multiplied and moved. 


When the first coefficient, Ao, is sent, it is loaded in MC, LB, and SB. Since the argument has not yet been 
checked, both the argument and the coefficient are checked for exception conditions and POLY DONE is 
then checked. If any exception condition is noted, special flows are accessed. POLY DONE asserted 
indicates that the coefficient just sent was the final coefficient (in this case, the first coefficient is also the 
last coefficient). If the argument (x) is zero, all terms except the Ag term of the polynomial are zero. The 
FPA accesses a special routine that returns Ag to the CPU as the result of the polynomial calculation. 


After both the argument and the coefficient are checked, and no exception conditions are found, the first 
multiply takes place. While the fractions are multiplied in the fraction multiply logic (FML and FMH), the 
exponents are added and adjusted to return the excess 80 notation (FCT). The sign of the result is 
computed (FCT). When the multiply is done, the fraction is moved to AR for the addition operation. To 
maximize calculation accuracy, no normalization is performed after the multiplication and eight additional 
low-order fraction bits are transferred to the AR register and stored in ARX. These eight bits are used 
when the new coefficient is added to the multiplication result to produce the new accumulation. 


While the multiplication fraction result is being transferred to AR, the exponent result is checked for 
exponent overflow or underflow. If no overflow or underflow is found, the addition begins as soon as the 
new coefficient data is ready. If, however, overflow or underflow are sensed, special flows that attempt to 
recover from the overflow or underflow are accessed (Section 2.3.6.3). 
While the new coefficient data is checked for zero and reserved operands, the addition/subtraction begins 
on the assumption that the coefficient data is valid. The exponent difference hardware selects the larger 
exponent for processing by the FCT and loads it into PR. It also shifts and loads the fraction associated 
with the smaller exponent into the B-input of FALU. FALU then adds or subtracts the fraction. When the 
coefficient data proves valid, the computed fraction result is transferred to NR where it can be 
normalized. 
The fraction normalization takes place in the FNM logic. A rounding byte is added and the result is 
shifted until normalized. The exponent is adjusted based on both the rounding byte and the number of 
shifts required to normalize the fraction. The normalized fraction is moved to MC and a multiply with the 
stored argument (x) begins. 
Once the first multiply is completed, the POLY calculation is in the general POLY flow. These flows: 

1. Multiply by the result of the last add and normalize by the argument (x), 

2. Receive a new coefficient from the CPU, 

3. Check the coefficient for exceptional condition, 

4. Add the coefficient to the result of the multiply operation, 

5. | Normalize the result of the addition, and 


6. Ready the result for the next multiply. 


The general POLY flows check the intermediate results for overflow, underflow, and zeros, and access 
special flows if an exception is found. 
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The general POLY flow continues until the CPU sends a coefficient flagged with POLY DONE rather 
than CP SYNC. This indicates that the coefficient just transmitted is the final coefficient in the table. The 
POLY DONE flow adds the final coefficient and then accesses the normalization flows in the FPA 
addition flows. These flows round and normalize the fraction and adjust the exponent based on the 
rounding byte and normalization shift. Once the result is complete, it is placed on the FP bus A and 
standard routines handle the transfer to the CPU. 


2.3.6.3 POLY Exception Flows - The POLY flows have many special sections to check for and handle 
exceptional conditions. Each coefficient is checked for zeros and reserved operands. The POLY argument 
is checked for zero. The CPU checks the arguments and degree for reserved operand. The FPA also 
checks the intermediate results for underflow, zero, and overflow. If an underflow or overflow is detected, 
special flows attempt to recover from the condition without a loss of accuracy. 


The exception flows (zero, reserved operand, overflow, and underflow) can be divided into three categories 
to handle exceptions discovered during: 


1. First coefficient and argument handling, 
2. General coefficient handling, and 
3. POLY DONE (final coefficient) handling. 


Within each category, different microcode handles float and double-precision operation. However, there is 
little difference between the exception procedures used in each category and only minor differences in the 
microcode. As a result, each individual exception flow is not discussed. Rather, the microcode procedure 
for each type of exception is explained. 


ZEROS 

The argument and each coefficient are checked for zeros. The argument and first coefficient are checked 
for zeros at the start of the POLY flow. If the argument (x) is zero, all the terms of the polynomial are 
zero except Ao, the last coefficient. With the argument equal to 0, the FPA remains in the first coefficient 
loop waiting for the last coefficient (flagged by POLY DONE). When it is received, it is tested for 
reserved operand and, if not reserved, is returned to the CPU as the result of the polynomial. If the first 
coefficient is zero, the accumulation registers are set to zero and the FPA waits for the next coefficient. 


If a zero is found as a subsequent coefficient (when the current accumulation is not zero), the current 
accumulation which is unnormalized is rounded and normalized, and the FPA waits for the next 
coefficient. 


RESERVED OPERAND 

Each coefficient is checked by FPA hardware for reserved operand. If a reserved operand is found, the 
POLY operation is immediately aborted and the accelerator error bit is set. The argument is not checked 
for reserved operand by the FPA because it is checked in the CPU and, if found to be reserved, the POLY 
Operation never starts in the FPA. 


OVERFLOW 
The FPA checks for overflow by examining the exponent bits PR8 and PR9 in the PR register. If PR8 
(the overflow bit) is high and PR9 is low, an overflow occurs. 


The FPA checks each current accumulation two times per cycle for an overflow condition — once when the 
unnormalized multiplication result is readied for adding the new coefficient, and once after the addition 
result has been rounded and normalized. If an overflow is detected in the second instance (normalized 
addition result overflow) the FPA aborts. The FPA sets the PSL V (overflow) bit and waits until the CPU 
traps it back to IRD. 
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If the unnormalized multiplication result overflows, the FPA accesses overflow routines in an attempt to 
recover an accurate result from the overflow. The FPA microcode is written based on the assumption that 
if the new coefficient exponent is subtracted from the current overflow, the result may be small enough so 
that the exponent no longer overflows (PR8 will be low). As stated before, PR8 is high. This means the 
exponent in PR is IOXXXXXXX (9 bits long.) Since the exponent difference taker EALU is only eight 
bits long, the overflowed exponent must be scaled down. The FPA subtracts 8016 to scale it down. 


The new coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. If 
the coefficient is zero, it does not change the overflow. The FPA attempts to recover from the overflow by 
first adding back the 80)¢ to return the exponent to the correct value, then normalizing and rounding. If 
this fails, the FPA sets the overflow bit and aborts. 


If the new coefficient is not zero or reserved, the operation continues. The FPA subtracts 80,6 from the 
exponent of the coefficient to scale it down. The reduced exponent coefficient is checked for underflow. If 
an underflow is sensed, the coefficient is effectively zero when compared with the accumulation. Since the 
coefficient is effectively zero, the FPA attempts to recover from the overflow by first adding back the 
80;¢6 to return the exponent to the correct value, then normalizing and rounding. If this fails, the FPA sets 
the overflow bit and aborts. 


If the reduced coefficient did not underflow, it shows that the coefficient can effect the accumulation and 
possibly recover it from the overflow condition. In the case of accumulation overflow flows, the accumula- 
tion is the larger number. Therefore, no checks are performed on the exponent to find the larger number. 
The exponent difference taker then subtracts the two scaled-down exponents to determine how many times 
the coefficient must be shifted to align the radix points. The POLY add/subtract takes place. The 
accumulation fraction is moved through ADER MUX to FALU and the restored (80)¢ added) accumula- 
tion exponent is moved to PR for processing. 


The POLY add/subtract takes place. The fraction result is moved to NR where it is normalized and 
rounded. The result exponent, formerly the accumulation exponent, is adjusted based on the fraction 
normalization and rounding. The result is checked for overflow and underflow. As stated at the beginning 
of this overflow section, an overflow after the normalization and rounding operation causes the FPA to 
assert the overflow “V” bit and abort. 


UNDERFLOW 

The FPA can handle numbers as small as .29 x 10°38. A number smaller than this causes an underflow. 
The FPA checks for underflow by examining the exponent register, PR. PR9 is high or PR<8:0> is low in 
an underflow. 


Underflow is not as serious a fault as overflow. An underflow means the result just checked is so close to 
zero that the FPA cannot accurately represent it. When encountered, the FPA sets the ACC ZDATA bit 
and special flows attempt to recover the number. If the underflow result cannot be recovered, the number 
is set to zero and the FPA operation continues. After the POLY operation is completed, the CPU traps on 
underflow if bit 6 (floating underflow) of the PSL is set. 


The FPA checks for accumulation underflow twice per POLY cycle —- once as the unnormalized multipli- 
cation result is readied for the following addition, and once after the result of the addition has been 
normalized and rounded. If an underflow is detected in the normalized addition result, no result recovery is 
possible. The FPA merely sets the accumulation to zero, informs the CPU of the underflow, and continues 
the operation. 


If an underflow is detected after the multiplication, special flows are accessed to save the result. In an 


underflow the exponent of both the accumulation and the coefficient must be scaled up so the exponent 
difference can be taken with an 8-bit exponent processor. The scale factor is 801¢. 
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The coefficient is first checked for zero or reserved operands. A reserved operand causes an abort. A zero 
coefficient does not change the underflow so the FPA attempts to recover by normalizing and rounding. If 
this fails, the accumulation is cleared (set to zero) and the FPA operation continues. 


If the new coefficient is not zero or reserved, the operation continues. The FPA adds 80j¢6 to both 
exponents to scale them up. If the coefficient exponent overflows when it is scaled up, the coefficient is so 
much larger than the accumulation that the accumulation does not affect the coefficient. The FPA 
disregards the accumulation and makes the new coefficient the accumulation. The FPA does this by 
subtracting the 80;¢ just added to the coefficient exponent and moving the coefficient to the registers 
formerly holding the underflow accumulation. 


If the new coefficient does not overflow, it shows that the coefficient can affect the accumulation and the 
exponent difference taker determines the exponent difference. Since the coefficient is the larger number, 
the coefficient fraction is moved through the ADER MUX to the FALU and the coefficient exponent is 
stored in PR after the bias previously added is removed. The accumulation fraction is shifted, based on the 
exponent difference, until the radix points align, and is then added or subtracted. The result is rounded and 
normalized in the normalize logic. The coefficient exponent (stored in PR) is adjusted based on the 
fraction normalization and rounding, and becomes the accumulation exponent. The rounded result is 
checked for underflow. If underflow is detected, the ACCZ bit is set and a zero is stored. The FPA 
informs the CPU that an underflow has occurred by asserting both FP SYNC and ERR SYNC. In any 
case, the polynomial operation continues. 
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CHAPTER 3 
LOGIC DESCRIPTION 


3.1 INTRODUCTION 

This chapter contains detailed information about the hardware operation of the FPA. In Section 3.2, the 
FPA is sectioned into hardware blocks and the operation of each block is discussed. The microcode 
sections (Sections 3.3 through 3.5) summarize both the FPA microcode and the FPA-specific microcode 
in the CPU. This discussion focuses on the generation and monitoring of the various control signals passed 
between the units. 


3.2 BLOCK DIAGRAM AND UNIT DESCRIPTION 

This section provides a functional description of each area of the FPA with relation to the control store and 
instruction execution. Discussions of logic unit operations are included for areas that require further 
clarification. 


The FPA can be divided into three areas. The first area contains two interface sections: the CPU-FPA 
interface and the FPA internal buses (which interface between the various sections of the data manipula- 
tion area.) The second area, data manipulation, contains five sections: fraction adder/subtractor, fraction 
normalizer/divider, fraction multiplier, exponent processor, and sign processor. Each section in this area 
Operates as an independent unit, capable of processing data in parallel with operations being performed in 
other sections. The third area contains only the control store and logic which controls both interfacing and 
data manipulation. Refer to Figure 3-1. 


The CPU transmits both data and instructions to the FPA. The instructions are decoded in the control- 
store logic and access an FPA control-store word. The FPA control-store word controls the transfer of the 
data on the FPA internal buses and the operation of the various data manipulation sections. The various 
data manipulation sections perform the required operations. The resulting answer is formatted and sent to 
the CPU-FPA interface. A signal from the FPA informs the CPU that the answer is available at the 
interface. 


Each of the eight sections mentioned in this section is discussed individually in the following sections. Each 
discussion includes an explanation of pertinent control-store fields and a description of the hardware 
operation as controlled by the control store, CPU instruction, data characteristics, and both internal and 
external flags. 


3.2.1 CPU-FPA Interface 

The CPU and FPA have numerous interconnections. They exchange data, instruction information, device 
control signals, and status information over buses and individual signal lines. There are three types of 
information transferred via the CPU-FPA interface. They are as follows. 


1. Control and status 
2. Data 
3. Trap and diagnostic 


They are discussed in the above order in the following sections. Refer to Figure 3-2 for a summary of the 
CPU-FPA interface. 
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Figure 3-1 FPA Block Diagram 
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Figure 3-2 CPU-FPA Interface 


3.2.1.1 CPU-FPA Control and Status - The FPA and CPU work interactively. This means they are 
constantly exchanging status and control information, and that operations in one unit can and do affect 
operations in the other unit. The status register (ID register 17) provides some CPU control of the FPA. 
Bit 15 of the status register is used by the CPU to enable the FPA. The CPU can disable all FPA outputs 
and affectively remove the FPA from the computing system by clearing bit 15. Refer to Figure 3-3 and 
Table 3-1 for a complete description of this register. 


STATUS REGISTER 
ID REGISTER #17 


3130 29 28 27 26 25 16 15 14 43 0 


ACC MINUS ACC ACC 
ERROR ZERO EN TYPE 
ERROR 
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Figure 3-3 Status Register 
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Table 3-1 The Status Register 


Bit Bit 
No. Name Access Function 


31 Accelerator Error Write by FPA Set when FPA detects an 
Also called ACC Read by CPU exception condition. 
Also called Error 
Sync 


30-28 Not Used-Set to 
zero 


27 Minus Zero Error Write by FPA Set when FPA encounters a 
Read by CPU reserved operand or 
generates an overflow. 
Setting this bit sets 
Accelerator Error. 


26-16 Not Used-Set to 
zero 


15 Accelerator Enable Write by CPU When clear all FPA outputs 
Read by FPA are disabled. This removes 
the FPA from the computing 
system. Must be set for 
normal FPA outputs. 


14-4 Not Used-Set to 
zero 
3-0 Accelerator Type Read by CPU A hardwired code identifies 
Hardwired in the type of accelerator 
FPA installed in the backplane 
slots. The FPA code ts 
0001. 


The FPA also receives control and status information from the CS bus. The functions of these lines are 
summarized in Table 3-2. 


Op code information (operation and precision) is transmitted to the FPA from the instruction buffer via 
IRC OPC lines 7 to 0. These lines, from byte 0 of the instruction buffer, are used by the A-Fork/B-Fork 
logic and BEN logic for FPA control-store next-address generation. A few other lines from the instruction 
buffer and decode logic provide specifier source information to the FPA. The possible sources of data are: 


Memory, 
Register, 

Short literal, and 
Long literal. 


ca are ale 


The CPU-FPA interface includes clock signals from the CPU to the FPA. The units operate synchronous- 
ly on a 133.3 nanosecond (ns) cycle. The TO of both units coincide. 
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The FPA transmits two status signals to the CPU: FP SYNC and ACC ERROR. These signals are input 
to the CPU for branch control. FP SYNC is normally asserted when an FPA result is available to the 
CPU. ACC ERROR is set during an FPA error condition. 


Table 3-2 _ CS Lines 


CS BUS 
71 710 Name Function 


0 0 NOP 

l 0 ACC TRAP Initiates an Accelerator trap. 

0 l CPSYNC Indicates CPU has received FPA data or CPU is 
presenting valid data to FPA. 

l l Redefine uSI Decodes CS lines 57, 56, and 55 for more informa- 
tion. 

CS BUS 

57 56 55 

l l 0 Poly End Indicates last term of polynomial has been trans- 
mitted from CPU. 

l l l FP TRAP Initiates an FPA trap. 


3.2.1.2 CPU-FPA Data - The FPA receives operand data from the CPU and, after performing the 
required operation, returns the answer to the CPU. The data is transmitted to the FPA via the ID bus and 
is returned to the CPU via the DF multiplexer bus. As mentioned previously, the FPA does not do any 
memory accessing. The CPU must calculate the data memory address, access the address, and place the 
data on the ID bus to the FPA. 


The FPA is optimized to use CPU scratchpad register data. It stores two copies of the 16 CPU scratchpad 
registers. To ensure that the FPA copies are exact copies, the FPA copies are addressed and written by the 
same lines that address and write the CPU general registers. The address lines are from the DAP board 
and the data is transmitted via the DF multiplexer bus. To ensure that a changing register is never read, 
the CPU updates the general register and the FPA copies between T66.6 and T133.3 (TO), and the FPA 
reads the copies between TO and T66.6. Note that the FPA general register copies are write-only memory 
to the CPU and read-only memory to the FPA. This means that results of FPA operations that are 
destined for the general-register set are transmitted back to the CPU via the DF multiplexer bus. These 
results are then written into the general-register set under CPU control rather than written directly into the 
general-register copies by the FPA. 


The data stored in the FPA general-register copies is read by the FPA using address lines from the 
instruction-buffer operand-source logic. This scheme enables the FPA to access register data and begin the 
operation as soon as the general-register address(es) is (are) in the instruction buffer. 


All operands other than register operands are transmitted to the FPA via the ID bus. This includes 
memory data and long and short literals. When memory data is specified in an instruction, the CPU 


3-5 


fetches it and places it in the CPU D-register. The contents of the D-register are placed on the ID bus and, 
in the FPA, are transferred from the ID bus directly onto the FP buses. Since the D-register and ID bus 
are each only 32-bits wide, two transfers are needed to transmit a double-precision number. With single- 
precision (float) literal data, part of the instruction stream is transferred from the instruction buffer onto 
the ID bus. In the FPA, single-precision literal data is latched into the literal register (LR) and then placed 
on the FP bus. The most significant part of double-precision literal data is handled similarly, that is, IB — 
ID bus — LR — FP buses. The least significant part of a double-precision literal is transferred from the 
instruction buffer over the ID bus to the CPU D-register, then back on the ID bus and onto the FP buses. 
Note that no ID bus addresses are required for data transfers over the ID bus. The FPA simply accepts the 
current ID bus data. 


When the FPA operation result is ready to be transmitted to the CPU, FP SYNC is asserted and the 
single-precision result or the most significant part of a double-precision result is on FP bus A. The CPU 
responds to FP SYNC by enabling the FPA DF multiplexer bus drivers which place the FP bus A contents 
on the DF multiplexer bus. The FPA result is transferred to the CPU D-register via the DF mux bus. 
When the CPU has the data, it asserts CP SYNC. This ends a single-precision (float) transfer or enables 
the second part of a double-precision transfer. For a double-precision transfer the second part is placed on 
FP bus A. This second part remains there until the CPU responds to the newly asserted FP SYNC by 
enabling the DF multiplexer bus drivers, accepting the data, and asserting CP SY NC to indicate it has the 
data. 


While the FPA is transmitting the result back to the CPU, valid condition codes are also being transmitted 
to CPU condition-code latches. These latches are read during the next machine cycle. The N, V, and Z 
bits are set based on the status of the result. The C bit is always cleared by the FPA. 


3.2.1.3 CPU-FPA Trap and Diagnostic - The FPA contains several features to facilitate error diagnosis 
and troubleshooting. These include programmable traps, microdiagnostics, special maintenance features, 
and the visibility bus. 


The CPU can initiate two types of traps: ACC TRAP and FP TRAP. CS 71 high and CS 70 low initiate 
an ACC TRAP. This causes the FPA to access one of the FPA microcode addresses 0 through 7 as 
selected by CS lines 57, 56, and 55. Currently, only two of these traps are used: accelerator power-up trap 
(address 0) and accelerator abort trap (address 2). The FP TRAP, used for FP microdiagnostics, is 
selected by CS lines 71, 70, 57, 56, and 55 high. When FP TRAP is asserted, the FPA microcode address 
is selected by bits 23 through 16 of the maintenance register. The trap address (0 through 255 in the 
microcode) is selected by the data previously loaded into the maintenance register. 


The maintenance register is a CPU-FPA readable/writable register located on the ID bus. The CPU 
accesses this register as ID bus register 16. The register is designed to facilitate maintenance. As discussed 
previously, it contains the FP trap diagnostic address. Using the trap address the CPU can exercise various 
sections of FPA logic. Bit 14 of this register provides a synch pulse that can be used for troubleshooting 
with an oscilloscope. This bit goes high each time the FPA accesses the microcode address stored in bits 8 
through 0. Refer to Figure 3-4 and Table 3-3 for summary of this register. 


Forty-three FPA signals are accessed by the visibility bus (V bus). The V bus is a diagnostic tool designed 
to allow polling of stable internal CPU (in this case, FPA) signals. The console can issue commands that 
load the V bus latches with the signals monitored, and then shift the loaded latches one bit at a time to a 
control word located in the console interface. At the console, the data shifted in is examined by diagnostic 
software. There are eight data input channels on the V bus; channel 6 is devoted to the FPA. Refer to 
Table 3-4 for listing of the FPA signals that are available to the V bus. 
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Bit 
No. 


31 


30-24 


23-16 


15 


MAINTENANCE REGISTER 
ID REGISTER #16 


24 23 


ADDRESS 


Figure 3-4 Maintenance Register 


Table 3-3 The Maintenance Register 


Bit 
Name Access 


Write Trap Address 


Not Used-Set to 
Zero 


Trap Address 


Write Microbreak 


Micromatch 


Not Used-Set to 
Zero 


Micro- 
break/Current Ad- 
dress 


16 151413 


| L+—zero >i<4— TRAP ADDRESS — ae a ZERO + 


MICRO MATCH 


WRITE MICRO BREAK 


Write by CPU Read by 
FPA 


Write/Read by CPU 
Read by FPA 


Write by CPU Read by 
FPA 


Write by FPA Read by 
CPU 


CPU writes microbreak. 
FPA reads microbreak. 
FPA writes current FPA 
microcode address. CPU 
reads current FPA mi- 
crocode address. 
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MICRO 
BREAK 


CURRENT _ 
ADDRESS 
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Function 


When set (by CPU) enables 
CPU to write trap address 
(bits <23:16>). 


Selects FPA microcode ad- 
dress for FPA micro- 
diagnostics. 


When set (by CPU) enables 
CPU to write microbreak (bits 
<8:0>). 


Set by FPA when currently ac- 
cessed by FPA microcode ad- 
dress equals address stored in 
microbreak (bits<8:0>). 


These bits serve two functions: 

1. The microbreak selects 
the FPA microcode ad- 
dress to be monitored for 
micromatch (bit 14). 

2. The current address pro- 
vides CPU monitoring of 
FPA microcode activity. 


Table 3-4 Signals Monitored by Visibility Bus 


FCTESHF COUNT 5 H FCTD EALU OL 
FCTESHF COUNT 4H FCTE COMPLL 
FCTE SHF COUNT 3 H FADR SFC (0) H 
FCTESHF COUNT 2H FNMS EALU CIN L 
FCTE SHF COUNT 1H FCTC SEL NORM H 
FCTE SHF COUNT 0H FCTP RA ADRS3 L 
FCTN FALU CARRY INH FCTP RA ADRS2H 
FCTN FAMX SEL 0H FCTP RA ADRS1IL 
FCTN FAMX ENOL FCTP RA ADRSOL 
FCTA A GT BH FCTP RBADRS3L 
FCTN SHF MUX EN 1L FCTP RB ADRS2L 
FCTN SHF MUX EN OL FCTP RB ADRSS 1L 
FCTN FALU FUNC SEL 2 H FCTP RBADRSOL 
FCTN FALU FUNCSEL1H DAPL ACC CONTEXT 0H 
FCTN FALU FUNC SELOH DAPL ACC CONTEXT 1H 
FCTN FAMX SEL 1H FCTC CLR RRL 
FCTN LOAD ARI H FCTH CP SYNCH 
FCTN LOAD ARO H FNME BUS ~ EXPL 
FCTN LOAD ARX H FCTJ ACC NDATA H 
FCTN LOAD BRI H FCTC ACC ZDATA H 
FCTN LOAD BRO H FCTC ACC VDATA H 


FADS BUS ~ FAD L 


3.2.2 FPA Internal Buses 

As discussed in Section 3.2, the FPA internal buses transmit data between the various data manipulation 
units. These units are arranged along two parallel 34-bit tristate buses called FP bus A and FP bus B. 
These buses transmit data from the CPU-FPA interface to the various data manipulation units, transfer 
intermediate results between units, and return the result to the FPA-CPU interface. The buses can transfer 
a complete 64-bit double-precision word or two 32-bit float words simultaneously. 


The BSC field of the microword controls a majority of the bus activity. The available sources include all 
FPA data manipulation units and the CPU-FPA interface. Refer to Table 3-5 for a summary of BSC bus 
control operations. Note that the BSC field controls only the data source. The destination is enabled via 
other control fields and accepts the data available on the FP buses. 


The buses handle both floating-point and integer numbers. The buses can handle intermediate, unpacked, 
and unnormalized data as well as final packed and normalized results. Since the buses must handle 
intermediate data, each bus contains two extra lines to handle the overflow and hidden bits. Refer to 
Figure 3-5 for a summary of data formats used on FP buses. 


3.2.3. Fraction Adder/Subtractor (FAD) 
The fraction adder aligns and adds or subtracts the fraction portions of two FPNs. The module contains: 


Two registers that receive data from the FP buses, 

Two multiplexers that manipulate the register data, 

A shifter to align register contents before an add or subtract, 
An ALU to add or subtract the data, and 

Bus drivers to place the result on the FP buses. 


oe 
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15 


Table 3-5 BSC Control Store Field 


ar 
Hex BSC Field Mnemonic Function 


a 2 26 
14 13 12 
0 


*The same data is placed on both buses. 
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Bus A+ SALU 


Bus B* — Bus A*—+ NSHF LO 
Bus B* — Bus A* + NSHF HI 


EXP SGN (Packed result) 


Buses — SALU and LSH if MUL 
TEMP and LSH if DIV 


(LSH is accessed 
differently if MUL or DIV) 


Bus A + LSH 

Bus B* — Bus A* + ID Bus 
Bus B* — Bus A* —LR 

Bus A~ID bus 

Bus B+ RB 


BusA«+RA 
Bus Be RB 


Bus A«+ FALU HI/LO 
Bus B+ FALU LO/HI OR 


Bus A+ FALU LO 
Bus Be FALU HI 


Bus A+ FALU LO 
Bus Be FALU HI 


OI-¢ 


SINGLE PRECISION (FLOAT) FLOATING POINT FORMAT 


OVERFLOW F P BUS LINES (EITHER A OR B) 
r— HIDDEN 
32. 30 28 26 24 22 18 


aa FRACTION “TL sser | FRACTION 


33326 5 4 3 2 1 0 3130 29282726 25 24 23 22 212019181716 


MSB FRACTION BIT SIGNIFICANCE LSB 
DOUBLE PRECISION FLOATING POINT FORMAT 
AR FORMAT FP BUSB OVERFLOW — /- HIDDEN FP BUSA 
33 3231 16 15 0 3332 31 161514 
aa FRACTION ER FRACTION — FRACTION 
NOT SIGN 
USED 
LS 

33 32 31 0 31 16 15 0 31 16 
MSB FRACTION BIT SIGNIFICANCE LSB 


Figure 3-5 


BR FORMAT 
OVERFLOW HIDDEN 
FP BUSA | i FP BUS B 

333231 16 15 0 3332 31 16 1514 

i FRACTION hall FRACTION pe FRACTION 
NOT SIGN 

USED 

LX 

33 32°6 0 31 1615 0° 731 16 
MSB FRACTION BIT SIGNIFICANCE LSB 
LONG WORD INTEGER (MULL) FORMAT 

FP BUS (EITHER A OR B) 
32 30 28 26 24 22 20 18 14 12 10 8 6 4 2 
MSB LSB 

NOT 

USED 

RESULT FP BUSA 

33 32 31 03332 

2ND CYCLE MOST SIGNIFICANT 1ST CYCLE LEAST SIGNIFICANT HALF 
HALF FROM SALU FROM LSH REGISTER 
NOT 
NOT USED 


USED 
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FP Bus Formats 


Refer to Figure 3-6. Certain FAD signals are interfaced to the V bus for maintenance and diagnostic 
purposes. Refer to Section 3.2.1.3 for a discussion of the V bus. 


FALU 


1 f FALU FUNC (FORMAT SELECT) 
SEL <2:07 BSC<3:0> 

ASHER 

(SHIFTS 


RIGHT) 


SHF COUNT<5:0> 
SEL AR FMT 


SIGN EXTENSION 


63:00 


(OUTPUT 


ENA 
(OUTPUT ENABLE) BLE) 


SHF MUX EN 


(OUTPUT ENABLE) 
FAMX EN 


SHFMX 
(SMALLER 
NUMBER) 


FAMX 
(LARGER 
NUMBER) 


(INPUT SELECT) 
FAMX SEL 

(INPUT SELECT) 

SHF MUX SEL 


63:07! (NOT 


CLK AR | LOADED) 


BUS FP A <33:00> 


BUS FP B <33:00> 
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Figure 3-6 Fraction Adder Block Diagram 
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The fraction parts of the FPNs are loaded into the AR and BR registers. The data entry is controlled by 
the FADC (fraction processor controls) control-store field as shown in Table 3-6. Both registers are loaded 
with the MSB in bit 63. The execution of the POLY instruction causes an additional seven LSBs to be 
transmitted via FP bus A lines <14:08> (where the FPE is normally located) and placed in AR <6:0> by 
loading ARX. 


Table 3-6 Fraction Data Entry 


FADC Fields Operation 
LOAD 


(3 [2 [1 [0 | 
Hex uCS uCS uCs uCS 
11 10 9 8 ARI ARO ARX BRI BRO 


0 0 0 0 
0 0 I 
0 l 0 
0 I l 
l 0 0 
I 0 l 
I l 0 
I l l 
0 0 0 


Select lines controlled by both microcode and hardware normally load the FPF associated with the smaller 
exponent into the SHFMX and the other fractional part into FAMX. 


CONN NNRWN — © 
—_-OOGdCOO0C Oo 

_—-Oo0o0or.K- on 
—— mm OHOCCKO- 
—-Ooo°o°nenonWcncro 
—_Oo°dor-- O&O -— © 
=—— One OOOO 


The contents of SHFMX is then right-shifted up to 63 bits to ensure that the radix points align. The 
magnitude of the exponent difference determines the amount of the shift. The shifted number is padded 
on the left with its sign. In most cases, the fraction is positive (Figure 3-7). 


When the two FPFs are aligned, the FALU operates on the two fractions. The FALU operation is 
determined by the opcode and the sign of the two numbers. Refer to Table 3-7. 


The output of the FALU is loaded onto the FP buses under control of hardware and the BSC microcontrol 
field (see Table 3-8). The result is in unnormalized form. When a double-precision ALU subtraction is 
done (either as the result of an ADDD, SUBD, or a POLY instruction), the exponent difference is 
examined. If it is less than or equal to 7, the operation continues as usual. However, if the difference is 8 or 
more, an error is introduced into the LSB if a shift, and then a subtract, is done. To prevent this error, 
special control hardware is enabled. A maximum shift count is generated with a sign extension of 0, thus 
placing all zeros on the shifter output. The smaller operand is routed through FAMX to the A side of the 
ALU. A B-A (B = all zeros) is done, complementing the operand. The larger operand remains stored in its 
original register. The result of the ALU operation is output to the FP buses and reloaded into the AR or 
BR depending upon where it was before complementing. During the next machine state the complemented 
operand is aligned, sign-extended, and added to the other operand. The result is loaded onto the FP buses 
and is normalized. 
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SHF COUNT 
(MAGNITUDE OF EPEREE 


ALIGNED DATA TO 
SHIFT) yy FALU INPUT B 


SHFR 
SHIFTS 0, 1,2,OR 3 


SHFC 
SHIFTS O, 4, 8, OR 12 


SHFB 
SHIFTS 0, 16, 32, OR 48 


64 
SIGN 
EXTENSION UNALIGNED DATA 
1°S FOR NEG FROM SHFMX 
O'S FOR POS 
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Figure 3-7 SHFR Operation 
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Table 3-7 FALU Operation 


Instruction 


Sign of Numbers 


FALU Operation 


Add Like (Both + or -) Add 
Add Unlike Subtract 
Subtract Like Subtract 


Subtract Unlike 


Add 


FALU Operations Selected 


A-B 
A+B 
Not Used 
Aor B 


CoO = =— 


Not Used 
Not Used 


—_— ae 


Comment 


B = 0. Used for complementing number when 
Shift/Subtract D.P. would lose bits off end. Used 
when SUBD and exponent difference is greater 
than 7 or POLYD. 


Normal Subtract 
Normal Add 


Used to get A out or B out. Other side is zero. 


Table 3-8 FALU MUX Control 


i ‘= 


FALU 
Function 


uCs uCS = 
11 10 


Not used for FALU MUX Control 


Hardware determined. 


NOTE 

During double precision add/subtract and poly; 
If EXP A<EXP B, AR format is used. 

If EXP B<EXP A, BR format is used. 


FP A FALU L (BR Format) 
FPFALU H 


FP A FALU H (AR Format) 
FPBFALUL 
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3.2.4 Fraction Normalizer/Divider (FNM) 

The normalize/divide logic located on FNM performs the two functions indicated by its name (refer to 
Figure 3-8). The hardware can either normalize the fractional result of an add, subtract, multiply, or 
divide, or generate the quotient of a given divisor and dividend. The quotient is generated bit by bit and 
stored elsewhere. When the quotient is complete, it is returned to the same hardware to be normalized as 
any other fraction result. Both functions receive data based on microcontrol words, but once the microrou- 
tine is started, the functions operate relatively free of microcode control until they are ready to transmit 
the answer. 


3.2.4.1 Normalize Operation — Before a normalize operation can take place, the remainder register must 
be cleared. A 3 in the 3-bit MSC field of the microstore word clears the remainder register during IRD. 
Since the divide operations use the RR, it is also cleared during the end of the divide flows (before the 
normalization of the quotient). 


The add, subtract, multiply, and divide operations produce results with varying characteristics. The 
add/subtract operation has the widest variability in result. Operand size (both fraction and exponent), 
operand sign, and desired operation, all contribute to this variation. The subtraction of two very nearly 
equal operands can result in a very small number, that is, a number that must be shifted left many times 
before it is in final normalized form. Addition of two operands with equal exponents produces a result 
between 1 and 2, necessitating a right-shift. Since the add and subtract operations do produce widely 
varying results, special firmware in the control store is accessed and the normalizations proceed under 
firmware and hardware control. 


A divide operation produces results between 1/2 and 2. A multiply operation produces results between 1/4 
and 1. Both divide and multiply normalizations proceed under hardware-only control. 


All normalizations begin with NRC equal to 0; parallel-loading the result to be normalized into the NR. If 
the operation is an A/S, a BEN 5 selects special firmware based on exponent differences. If the special 
firmware is enabled, an NRC equal to 2 enables the NR to shift left in 4-bit steps, three steps per machine 
cycle. 


Once the NR shift left is enabled, hardware looks at the top 12 bits of the NR for the first significant bit 
as the leading bits are left-shifted away. In a positive number, leading zeros are disregarded and the first 
significant bit is a 1. In negative numbers (2’s complement notation) leading 1s are disregarded and the 
first significant bit is a 0 (refer to Figure 3-9). MSN NE SIGN becomes true as the data is parallel-loaded 
into NR if the first significant bit is in NR <63:60>. This stops any left-shifts. STOP SHF goes high 
whenever NR <59:56> contains the first significant bit and causes the NR to stop shifting after one more 
4-bit shift (that is, when the first significant bit is in NR <63-60>). If NR <63:52> does not contain the 
first significant bit, SWR remains low, shifting all 12 bits out and enabling a new microstore control word 
via BEN 2. It continues monitoring for the first significant bit. If the NR is left-shifted 60 bits (counted by 
the control store), and the first significant bit is not found, firmware returns a zero as the result by forcing 
the output of the NMX to zero via FORCE ZERO. 


When the first significant bit is in NR <63:60>, the number can be rounded and normalized by the 
remaining FNM logic. 


The round-byte contents, NALU operation, and final normalization shift are controlled by the round-bit 
generator. The round-bit generator controls these functions based on NR 63, NR 62, NR 61, and RES 
NEG. The round byte is combined with NR lines 39 through 36 (float or single-precision) or lines 7 
through 4 (double-precision). This is selected via the FLOAT line. Since the final normalization shift takes 
place after the round byte is added, and the first significant bit can be in NR 63, NR 62, NR 61, or NR 
60 (it must be in one of these four positions), the position of the round bit (1) in the round byte varies 
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QUOTIENT 


BIT STREAM NALU 


BUS FPB 33:00 


BUS FPA 33:00 
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Figure 3-8 Fraction Normalizer/Divide Block Diagram 
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SWR 
MSN NE SIGN 
STOP SHF 


NR <63:52> 


RES NEG 
IF NUMBER IS NEGATIVE DISREGARD LEADING 1S, 
IF POSITIVE DISREGARD LEADING OS. TK-0272 


Figure 3-9 Normalize Shift Enable Control Hardware 


(refer to Table 3-9). As summarized in this table, decode logic divides the 16 possible input cases into 4 
cases corresponding to the FSB in bit 63, 62, 61, and 60. Note that the RBG does not monitor NR bit 63, 
but since the logic is only enabled when the FSB is in bits 63 through 60, the RBG logic can sense the 
contents of NR bit 63 even though it does not monitor it. RES NEG L enabled means that the number 
being shifted and normalized is negative. This means that leading 1s (Hs) should be disregarded in the 
search for FSB and that the FSB is a 0 (L). RES NEG L high indicates a positive number, disregarding 
leading Os (Ls), and FSB is a 1 (H). The contents of the rounding byte is based on the location of the FSB. 
The rounding byte is designed to place a one 24-bits (56-bit for double-precision) behind the FSB. 


If the FSB is not in NR <63:60>, the NR is left-shifted and a binary counter counts each 4-bit shift. This 
count, RES NEG line, and NR bits 63, 62, and 61 (magnitude of final shift), determine the NORM ROM 
location to be addressed. The contents of this location is added to the exponent of the result in the FALU 
and corrects it for all shifts that take place in the FNM. If however, the number to be rounded is all Is, the 
addition of the rounding byte ripples through all bits and causes a fraction overflow. This is sensed by 
comparing the round-byte location (indicating where the logic has decoded the current MSB of the number 
to be rounded) and location of the MSB of the rounded result. If this comparison asserts NORM ERR and 
thus EALU CIN (indicating there is a ripple and subsequent overflow), a one is added to the EALU (the 
exponent adder on FCT) to correct the exponent for the overflow. NR <63:04> goes to the NALU B side 
and round byte (4-bit) goes to the A side. Normally the NR is added to the rounding byte. However, if 
RES NEG L is asserted, indicating a negative (2’s complement) number, the content of the NR is 
subtracted from the rounding byte. This operation rounds and complements (returns to positive notation) 
in one step. 


The 60-bit result <63:04> of the NALU operation, rounded and ready to be normalized, is transmitted to 
the NMX. The high part (and only part, if float or single-precision) is transmitted through to the NSHF 
for final normalization shift. The NSHF shift control bits select a 0- to 3-bit shift for final normalization. 


Final normalization moves the MSB to the equivalent of the NR <62> position. When the data is placed 
on the FP buses, NR <62> (always a one since the fraction is now normalized) is the hidden bit and is 
placed on the FP bus A bit 32. When the data is transferred to the CPU, the hidden bit is not transferred 
and the data in NR <61> (bus A bit 6) is the MSB to be transferred. 
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Table 3-9 Round Byte and Normalize Control 


1. The logic decodes the four signals and locates the FSB. 


NR63 NR62 NR61 First Significant 
Bit (FSB) 


- 


Pe ee ee ee ee ee 

Hore rerrerrrereee 
Borer erer ners 
eG ee ey ee ee 


*RES NEG L high indicates a positive number. This means a | (H) is the FSB. RES NEG L low indicates a 
negative number. This means a 0 (L) is the FSB. RES NEG L asserted also causes a NALU subtract thereby 
rounding and complementing the number in a single step. 


2. Based on location of FSB, an appropriate rounding byte is generated. 


Rounding Byte Selected 
Bit 3 Bit 2 Bit 1 


3. Also based on location of FSB, the final shift required to normalize and ready the result for the 
CPU is selected. 


SHF VAL 0 


Right 1 place 
No shift 

Left | place 
Left 2 places 


ae ae On oe 
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3.2.4.2 Divide Operation — This logic also performs the fraction part of the divide operation for the FPA. 
Once the dividend and divisor are loaded into the FNM logic, and the quotient storage on the multiplier 
boards is enabled for either a float (single) or double-precision result, the divide operation runs under 
hardware control until the answer has been computed to the required precision. Once the answer has been 
computed, microcontrol takes over and transmits the unnormalized quotient back to the FNM logic where 
it is normalized and rounded like any other fraction. 


The hardware uses the restoring, repeated subtraction technique to divide. The dividend is initially loaded 
into the RR and the divisor is stored in the NR. The divisor (contents of NR) is subtracted from the 
dividend (contents of RR). If the result is negative, a zero is left-shifted into the answer (quotient) register 
and the content of the RR is left-shifted by one. If the result is positive or zero, a 1 is left-shifted into the 
answer (quotient) register and the result is loaded into the remainder register left-shifted by one. The 
divisor (contents of NR) is continually subtracted from the contents of the RR until 26 bits (58 bits for 
double-precision) of quotient are generated. The quotient is then rounded and normalized. 


The division operands are loaded under microstore control. The first microstore state loads the dividend 
into the NR. The second state causes the NALU to OR the contents of the NR with the contents of the 
RR (currently clear) and load the result of the operation into the RR. In the same state the divisor is 
loaded into the NR. At the end of the second state the division operands are in their correct register and 
the divide sequencer hardware takes over. 


The divide sequencer hardware generates the RR control signals (see Figures 3-10 and 3-11). The RR 
CTL signals either load the NALU result into the RR or left-shift the RR contents based on the result 
being negative or positive. The input of the RR is hardwired to automatically produce a left-shift when 
loading the NALU result. This means that during the initial loading of the RR the dividend is left-shifted 
by one. The 11 state in Table 3-10 right-shifts the dividend by one to adjust for this before beginning the 
divide operation. 


The answer is generated at the rate of one bit per 66.6 ns. If the result of the NALU subtract is positive or 
zero, a one is left-shifted into the quotient register. A negative NALU result causes a zero to be shifted 
into the quotient register. The quotient register 1s made of two multiplier registers (TEMP and LSH). In 
single (float) precision the quotient bit stream is shifted into TEMP (use only TEMP <29:4>). In double- 
precision the bit stream shifts into LSH <31:4> then to TEMP <29:00>. When a one is left-shifted into 
TEMP <29> or <28> on the proper time phase in the multiplier logic, DIV DONE 1s asserted. This stops 
the division and accesses a new microstore word that normalizes and rounds the quotient. 


3.2.5 Fraction Multiplier (FML and FMH) 

The fraction multiplier logic in the FPA is located on two modules: FMH (fraction multiplier high) and 
FML (fraction multiplier low). The logic handle all multiply functions, part of the EMOD function, and 
also store the division quotient as it is generated. It accepts data from the FP buses, performs the required 
unsigned multiplication, and gates the results back on the FP buses (refer to Figure 3-12). 


The FPA microcontrol controls the loading of both the multiplicand and multiplier into the appropriate 
FM (fraction multiplier) registers. In both float and double the complete multiplier is stored on the FMH. 
During the single-precision (float) function, the FMH handles the upper 16 bits of the multiplicand and 
FML the lower 8 bits. The answer is completed after one pass through the logic. For double-precision (56 
bits) the upper-half of the multiplicand fraction is handled in the FMH and the lower-half in the FML. 
Two passes are required to compute the final answer. 
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Figure 3-10 Divide Sequence Hardware 
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Figure 3-11 Divide Sequence Timing 


Table 3-10 Divide Sequence States 


ae FNM RRCTL | RR 
Function st} 0 | Function 
L L 


0 0 0 NOP NOP 

0 0 l NOP L E 

0 ] l LDNALU | H H Parallel LD** 

TORR 

l l 0 Shift R* L H Shift R* 

l 0 0 Divide H Ht Parallel LD Result** 
H Lt Shift L RR Contents 

l 0 0 Divide Refer to 

PREVIOUS STATE 


*Used only once at the beginning of each divide. 
t Control bit 0 is controlled by RES POS H. 
**Since the RR is hardwired for a left shift, a parallel load shifts the data one place left. 
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Figure 3-12 Fraction Multiplier Block Diagram 


TEMP 


FP BUS 
SEL/DRVR_> | 


BUS FPA 


BUS FP B 


TK-0278 


The FM multiplies under the control of its own control logic. After the operands are loaded, the MCTL 
field in the FPA microcontrol is asserted. This starts the multiplication. A float multiply is stopped by the 
microcode two states (266.7 ns) after it starts. For a double multiply, control goes to a wait state and 
remains at that location until MUL/DIV DONE is enabled. When enabled this indicates that the FM logic 
has finished the operation. At this point microstore control takes over and the answer is transmitted to the 
normalize logic or, in the case of EMOD or MULL, transmitted to the CPU as an unnormalized number. 


A pipeline technique is used to obtain fast multiplication (Figure 3-13). The multiplier is divided into 4-bit 
nibbles. The nibbles are then accessed consecutively by a counter-multiplexer combination (least signifi- 
cant nibble first), and each nibble operates on up to 32-bits of multiplicand. The MCAND bus and 
MPLIER nibbles are used to address the ROMs. The banks of ROMs provide a 4 X 4 primitive with two- 
way interleaving. The data is latched (ROM STORE) and applied to the inputs of 4-bit adders (FALU). 
These adders combine the ROM data to form a partial product, storing the carry-out of each 4-bit section 
(to be added in on the next cycle). The partial product is latched in PPROD and passed to another row of 
adders (AALU) that accumulate the final product (again, saving the carries). Thus, when the pipeline is 
operating, there are four processes cycling at the same time. 


1. Select ROM addresses. 

2. Latch ROM data. 

3. Form partial product. 

4. Accumulate final product. 


After the final product is calculated, the stored carriers from both stages are combined with the accumu- 
lated product using full carry look-ahead to produce the final answer in a single-precision (float) operation. 
In a double-precision operation, this result is stored and used during the generation of the final answer 
during the second pass. 


Each of the pipeline processes, with the exception of accessing ROM data (which occurs in each bank of 
ROMs on 66.6 ns), occurs at 33.3 ns intervals. 


The operation of the FM hardware is discussed in three sections. The first section explains the operation of 
the pipeline; concentrating on operand loading and manipulation of partial products, partial results, and 
carries to produce the final answer. The second section concentrates on the control logic and how the 
signals that control the pipeline are generated. The third section explains how the FM registers are used to 
accumulate the quotient during a divide operation. 


3.2.5.1 The Pipeline - 


Loading the Operands 

The multiplication process begins with the loading of the operands. Data is transferred along the FPA 
buses in several formats. The multiplicand loading logic sorts out these formats and loads the multiplicand 
register (MCO, MC1, and MC INT). It loads the register so that when the MCAND bus does a parallel 
access of the MCAND, the MSB of the multiplicand is always in MCAND bus bit 31, and each following 
bit is progressively less significant (Figures 3-14 and 3-15). 


The multiplier, up to 56 bits (14 nibbles) long, is loaded into MP1 and MPO on FMH. MP1 is 24-bits (6 
nibbles) long and MPO is 32-bits (8 nibbles) long. Unlike the multiplicand, the multiplier is loaded in one 
format only (Figure 3-15). The MSB is in MP1 <23> and each following bit is progressively less 
significant. The LSB is MP1 <00> for single-precision (float) or MPO <00> for double-precision. The 
single format is possible because, as stated before, the multiplier is used consecutively and the various 
formats are sorted out by the counter as the nibbles are used during the multiplication. 
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Figure 3-13 The Pipeline 
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Figure 3-14 Loading and Accessing the Multiplicand 
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Figure 3-15 Loading and Accessing the Multiplier 


Selecting the Multiplicand 

The operands, multiplicand and multiplier, are enabled onto their respective buses, MCAND bus and 
MPLIER bus, under control of operand bus-source logic. Refer to Figures 3-14 and 3-15 and Table 3-11. 
All 32 lines of the MCAND bus are used everytime. During a MULF and EMOD for the first pass of a 
MULD and EMOD, the MCAND bus accesses MCX. Both MULF and MULD (first pass) use only the 
top 24 bits, as the lower 8 bits are discarded later in the pipeline. 


The MPLIER BUS multiplexer begins by selecting the least significant byte of the multiplier. Interleaving 


hardware later selects the high or low nibble of the bus. The multiplexer then selects a new, progressively 
more significant byte each 66.6 ns. 
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| MCAND BusLoad Enable 
Operation DOUBLE a OPC7 MCIL | MCO | MCINT MCX 
EMODF or MULF 
MULL (INTEGER MUL) 
EMODD or MULD 
Ist Pass 
2nd Pass 


*MCAND Bus lines are low enabled. 


Table 3-11 Operand Bus Source 


MCAND Bus lines fed 


MPLIER BUS 
Nibble Select 


Start at A, do 
6 nibbles 


Start at 6, do 4, 
then start at 2, do 4. 


Start at 2, do 14 


Start at 2, do 14 


Selecting the ROM Address —- The Interleave Hardware 

Both the MCAND and MPLIER buses are divided into 4-bit nibbles for ROM addressing. Each MCAND 
nibble (8 nibbles) is combined with a MPLIER nibble to provide address bits for 16 4 X 4 look-up ROMs. 
Rather than compute the product of the two 4-bit nibbles, the fraction multiply hardware uses look-up 
ROMs. The multiply results are stored in the ROMs. The data is stored within the ROMs such that the 
content of the address accessed by the two nibbles is the 8-bit result of a multiply of the same two nibbles. 
Since the ROMs are relatively slow, the 16 ROMs are divided into two interleaved 8-ROM banks. One 
bank is accessed by the low MPLIER nibble (MP <03:00>) and the other by the high MPLIER nibble 
(MP <07:04>). Both ROMs are addressed on 66.6 ns cycles; the MP low ROM is first and the MP high is 
second (trailing by 33.3 ns). The addressing of a ROM bank ends the first part of the pipeline. 


Latching the ROM Data 

The second part of the pipeline selects the outputs from either of the ROM banks (using the ROM SEL 
MUX) and latches the data (64 bits) in ROM STRG. The second part of the pipeline alternately selects 
data from the low and high ROM banks on a 33.3 ns cycle. 


As the ROM data selected is being latched, the first part of the pipeline selects a new address for the ROM 
bank just selected. The output of other ROM banks is selected during the next cycle (33.3 ns in the 
future). The address lines of this ROM bank were changed 33.3 ns ago and the outputs are settling. 


Forming the Partial Product 

The outputs of ROM STRG and any carries from the previous PALU add are added to form the partial 
product. The PALU is eight 4-bit adders. The outputs of the ROM STRG are wired to the PALU adder 
inputs such that bits of equal significance are combined. The outputs of the PALU without carries are 
stored in the PPROD LATCH. The carries are stored in CARRY HOLD registers to be added in on the 
next PALU add. The latching of the partial products in the PPROD LATCH ends the third part of the 
pipeline. 


As indicated previously, each multiply cycle selects four new bits from the multiplier register and each 
four new bits are four positions more significant. This means that the input of the PALU add becomes 
four bits more significant each multiply cycle. Because of the increase in significance, the stored carry-out 
of each PALU adder is input (on the next cycle) to the carry-in of the same PALU adder rather than to 
the carry-in of the next PALU adder. 


Note that while the third part of the pipeline is operating, new ROM data is being placed in ROM STRG 
to be presented to the PALU inputs on the next cycle, and new ROM addresses are being generated to 
access new data. 


Accumulating the Result 

The fourth and final section, the AALU and associated accumulator (ACCM), adds the partial products 
computed in the previous pipeline section to the result stored in the ACCM. Included in the result stored in 
the ACCM are carries from the previous AALU cycle. The result is latched into the ACCM and LSH 
registers. 


The AALU, ACCM, and ALU carry-hold interconnections automatically shift the ACCM contents and 
ALU carry-hold contents to adjust for the 4-bit increase of each new partial product. Because each partial 
product input to the AALU is four bits more significant than the previously stored ACCM contents, the 
outputs of the ACCM are wired to shift the ACCM contents four bits right (a decrease in significance) 
before being added to the PPROD LATCH contents. The lower four bits of the AALU output are always 
right-shifted into the LSH register. In double-precision operations, the content of this register is the least 
significant half of the result. 
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As with the PALU carries, the carry-out of each AALU is stored and added in on the next cycle. Also 
similar to the PALU logic, the stored carries are added to the AALU adder that generated them because 
the content of the AALU is now four bits more significant than when the stored carries were generated. 


The latching of the accumulating final result in the ACCM ends the fourth pipeline section. 


The four sections of the pipeline continue to operate until stopped by the FM control logic. The stopping 
point is selected based on both function and precision. 


SALU OPERATION 

When stop is initiated, the whole pipeline stops and new logic, the SALU, is accessed. The two sets of 
stored carries still in the pipeline are added to the total product on the output of AALU. When a pipeline 
stop is initiated, the AALU output (SALU input) is the contents of ACCM plus the current PPROD. Both 
the ACCM plus PPROD addition (the AALU operation) and the PPROD forming addition (the PALU 
operation) form stored carries. 


The hard-wired 2-bit shift in the PPROD LATCH input is not part of the several 4-bit shifts that take 
place throughout the FM logic. Rather, this 2-bit shift formats the stored carries so they may be easily 
combined for a final answer in the SALU. Both the PALU and AALU are composed of 4-bit adders with 
carry-outs. This means that the carry-outs are generated every four bits and that the PALU and AALU 
stored carry-outs can be treated as numbers of the following format. 


X000X000X X is a stored carry (data bit) 
0 is a zero (nonsignificant bit) 


Conventional wiring (output of a 4-bit PALU adder to input of a 4-bit PPROD LATCH to a 4-bit AALU 
adder) would cause the data bits of the PALU stored-carry to line up (be of equal significance) with the 
AALU stored-carry. This would prevent PALU stored-carries, the AALU stored-carries, and the ACCM 
result from being combined in one operation in one adder (the SALU). However, wiring the PPROD 
LATCH input and outputs with a 2-place shift, generates a PALU stored-carry number with data bits of 
significance between the AALU stored-carry data bits. This shift allows both AALU and PALU stored- 
carry numbers to be input to one side of the SALU. This is because the data bit of the PALU stored-carry 
is always a nonsignificant bit of the AALU stored-carry and vice versa. Refer to Figure 3-16. 


The use of the SALU result is determined by operation and the operation precision. If the SALU result is 
the final answer, the result is transferred to the FP buses under both op code control and FPA microcon- 
trol. If, however, the operation is double-precision, the result is stored and shifted to format it for later 
Operations under FM logic control. Before the shift, the most significant half of the operation is in TEMP 
and the least significant half in LSH. The shift transfers the contents of LSH (the least significant half) to 
the ACCM register which is designated ACCM 14 (see Figure 3-20) at this time. Within the same 
microcyle, the most significant half is transferred from TEMP to (just vacated) LSH. 


For the second pass through the pipeline, the second half (the more significant half) of the multiplicand is 
accessed from register MC1 and MCIL. Logic enabled only during the second pass combines with the data 
transferred to LSH from TEMP with the new result being accumulated. Otherwise, the operation of the 
pipeline during the second pass is the same as during the first pass. 


3.2.5.2. FM Control - The fraction multiplier logic is hardware rather than firmware controlled. Four 
state bits select one of 13 function states that control the FM logic. Within each state the state bits, various 
internal flags, and various flags from other FPA logic, are combined to provide the control signals needed 
to implement the selected state’s functions (Figure 3-17 and Table 3-12). 
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Figure 3-16 SALU Operation - Adding the Stored Carries 


The states can be roughly divided into four groups. 


IRD 

Integer multiply 
Fraction multiply 
Divide 


naae ack ke 


This section discusses the states by groups and in the order shown above. Within each discussion, the states 
are discussed in the order they are accessed within the group. This is important because the function of 
some states is partially dependent on the previous state. 


The state of the logic is defined by the output of the PRESENT STATE register which is clocked on a 
33.3 ns cycle. The inputs to this register (the next state) are based on the current state and internal and 
external flags. A majority of the internal flags provide sequence information and are generated in the logic 
shown in Figure 3-18. 
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Table 3-12 FM Control States 


X3 X2 XI x0 CNTR NEXT | NEXT | wuypiv 
LDCNTR | GonstANT ee CLR MONEY |PPROD| ACCM| —_LSH 
PREV 
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i ® * * 


0110 IF INT NOP 
ELSE 0010 * 
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SEQUENCE. 


ENTRY FROM STATE 0000 AT T33.3 TO 
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Figure 3-18 FM Control Logic 


IRD Group (Instruction Register Decode) 

When the FM logic is not performing a multiplication, it is in IRD. While waiting, the logic is continually 
cycling through the four states in this group preparing the FM logic for a multiply. In this IRD group the 
op codes in the instruction buffer are monitored. Initially (in INIT) the FM logic is set up fora MULF, but 
if the op codes indicate either a MULL, MULD, or EMODD, new information is loaded into the FM logic 
in the CONT state. The FPA microcontrol loads the MPLIER and MCAND register during IRD if the 
opcodes indicate a multiply operation. 


The control logic enters INIT whenever the multiplier operand control (OPLD) field in the FPA microcon- 
trol store is F. This normally happens during the FPA IRD or when a multiply operation is finished. The 
SYNC state is entered at CPU T33.3 and synchronizes the FM clock with the CPU clock. It also clears 
FLAG. CONT is entered at T66.6 and loads new information if the op codes indicate a MULL, MULD, 
or EMODD. TEST is entered at T99.9. In TEST, if the MCNT bit in the FPA microstore is not asserted 
(indicating that the FPA does not want the multiply pipeline to begin), the FM returns to the INIT state 
and continues waiting. If, however, MCNT is asserted (indicating that the multiplier operands are loaded 
and the FPA wants a multiply to start), the correct execution state is selected based on the op code. Refer 
to Table 3-12 for a summary of IRD group functions. 


Multiply Float Path 

If the op code indicates a MULF, the PIPE state is selected and the multiplier pipeline can continue. Note 
that during INIT the nibble counter is loaded with MULF control data for ROM lookup to start (based on 
that data). Since a MULF is being done, the data in the beginning of the pipeline is correct. 


The logic remains in this state (PIPE), running the pipeline and accumulating the answer, until D1 (a 


timing signal) is asserted. When D1 is asserted, the current content of the PPROD + ACCM + the stored 
carries is the final correct answer. 
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Asserting D1 selects the CADD state. This state NOPs most of the FM registers and enables the SALU 
add of stored carries to the AALU content. CADD also latches the SALU result into TEMP. The FM 
logic remains in CADD 99.9 ns (until D4 is asserted.) 


Since FLAG is cleared during the IRD group and never set, asserting D4 initiates the DONE state. This 
state asserts MUL/DIV DN and NOPs all other FM logic. MUL/DIV DN, monitored by the FPA control 
logic, returns control to the FPA microcontrol. It is the FPA control store that selects the MULF result, 
via a multiplexer, directly from the SALU outputs rather than from TEMP. The FM logic remains in 
DONE until returned to INIT by the multiplier INIT code in the multiplier operand control field of the 
FPA microcontrol store. Refer to Figure 3-19 for a summary of MULF control. 


MULD Path 

If, when the FM control logic is in TEST, the op codes indicate a double-precision multiply (DOUBLE 
set), the WAIT state is entered. Initially (in INIT) the nibble counter was loaded for MULF and ROM 
lookup started. Then in CONT (66.6 ns later), when a MULD was decoded, new data was loaded into the 
nibble counter. The WAIT state waits for the data loaded in CONT to settle and access new ROM 
locations before beginning the pipeline. After 66.6 ns in this state, FLAG is set. In this context, FLAG set 
indicates the first pass in a double-precision multiply. After 99.9 ns, since both DOUBLE and FLAG are 
set, PIPE is entered. 


The logic remains in the PIPE state, running the pipeline and accumulating the answer until D1 (a timing 
signal) is asserted. When D1 is asserted, the current contents of ACCM + the two sets of stored-carries are 
the first half of the MULD partial product. 


Asserting D1 selects the CADD state. This state NOPs most of the FM registers and enables the SALU 
add of stored-carries and the ACCM content. CADD latches the upper 32 bits of the first half of the 
MULD partial product in TEMP. The lower 32 bits accumulate in LSH during the pipeline operation. The 
FM logic remains in CADD 99.9 ns (until D4 is asserted). 


Since FLAG is asserted, indicating first pass, asserting D4 selects the XFER state. Four cycles in the 
XFER state transfer the content of TEMP and LSH to LSH and ACCM (see Figure 3-20), clear FLAG, 
and clear the stored-carry registers. 


The assertion of D8 returns the FM logic to PIPE. The FLAG bit is now cleared and DOUBLE set - 
asserting ALU ADD. This signal causes the data stored in LSH during the XFER state to be added (four 
bits per cycle) to the final product being developed. Six cycles transfer all 24 bits stored during XFER. 
While these bits are being right-shifted from the right end of LSH into the MSBs of the developing final 
product, the LSBs of the developing final product are being right-shifted into the left end of the LSH. 


When 20 bits have been transferred in from LSH, SHF ZERO is enabled. This causes the logic to enter 
the ADDZ state. The final 4-bit transfer of LSH data takes place during the first ADDZ state. After that, 
the ALU that added LSH to the ACCM is disabled. During this state, the pipeline continues functioning 
and the LSBs of the accumulating final product are still shifted into the left end of LSH. The only 
difference between PIPE and ADDZ during this second pass is, in PIPE, LSH data bits are added into the 
MSB of the ACCM, and, in ADDZ, zeros are added. Note this state even has the same ending criterion as 
PIPE; D1 asserted. 


D1 asserted transfers control to the CADD state. As discussed in MULF path, CADD is entered when the 
ACCM + the two sets of stored-carries is the final answer. In CADD, the stored-carries are added to the 
AALWU content by SALU and the result is latched into TEMP. Since FLAG is now clear, the assertion of 
D4 causes a transfer to DONE. 
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Figure 3-19 MULF Control 
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Figure 3-20 The XFER State 


In DONE, MUL/DIV DONE is asserted. This causes the FPA microcode to select and transfer, via 
multiplexers, the upper 32 bits of the double-precision result from the SALU onto FP bus A. The lower 32 
bits are selected and transferred from the LSH register onto FP bus B. Refer to Figure 3-21 for a summary 
of MULD control. 


MULL Path 

If the op code being monitored during CONT decodes as MULL, new data is loaded into the nibble 
counter. The logic proceeds to TEST and, in TEST, selects the WAIT as the first execution state because 
INT (integer) is set. 


In WAIT, the new ROM data selected by the new ROM address, which is accessed as a result of the new 
data loaded into the nibble counter during CONT, is given time to settle before entering the pipeline. 
When FLAG is set, the data has settled and the integer multiply pipeline state (MULL) is entered. 

The FM logic remains in the MULL state as the pipeline accumulates the final product (the least 
significant half accumulates in LSH). When COUNT = 3 is set, the AALU + the two sets of stored- 
carries is the final product. COUNT = 3 asserted selects DONE. 


In DONE, MUL/DIV DONE is asserted and the final product is available. 
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Figure 3-21 MULD Control (Sheet 2 of 3) 
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Figure 3-21 MULD Control (Sheet 3 of 3) 


The FPA microcode loads the upper half from the SALU onto FP bus A during one machine cycle. On 
the following cycle the lower half is loaded from LSH onto FP bus A. Refer to Figure 3-22 for a summary 
of MULL control. 


3.2.5.3 Division - The TEMP and LSH register in the fraction multiplier logic are used to store the 
quotient generated during floating-point division. The registers are concatenated with the MSB of LSH 
shifting into the LSB of TEMP. 


During a divide operation the FPA asserts DIV and loads the divisor and dividend into the FNM. In the 
FM logic, the nibble counter is loaded for a MULF and clocks through until TEST. To initiate quotient 
storage the multiply control field (MCNT) of the FPA microcode must be asserted. The combination of 
MCNT and DIV asserted selects the NOP state in the division path. 


The FM logic enters NOP with the nibble counter odd and exits when the nibble counter is even. The two 
cycles (66.6 ns) allow the first quotient bit to be formed. 


From NOP, the FM logic enters DIV. In DIV, the logic left-shifts LSH and TEMP one bit every even 
cycle. When doing a single-precision division the single quotient bit is input to both LSH bit 4 and TEMP 
bit 4. The data input to LSH is never accessed in single-precision. In double-precision the TEMP bit 4 
quotient input is blocked and the TEMP bit 3 is input to TEMP bit 4 on the left-shifts. 


DIV DONE is asserted when quotient bits are left-shifted in TEMP bits 28 and 29. This condition is tested 
at T66.6 of each state and transfers control to DONE if true. 


In DONE, MUL/DIV DONE is asserted, stopping the division process in the FNM. This causes the FPA 
microcode to access TEMP for a single-precision quotient and TEMP and LSH for a double-precision 
quotient. 


3.2.6 Exponent Processor 

The exponent processor, part of the FCT, processes the FP exponent during FP operations. During FP 
multiply/divide, the processor adds or subtracts the exponents as needed. During add/subtracts, the 
processor stores the larger exponent and determines the final exponent by taking into account the 
operation, fraction right-shifts, and left-shifts during normalization. By comparing the exponent magni- 
tudes the exponent processor also controls the FPF addition and subtraction in the FAD (see Figure 3-23). 


The FPEs are loaded from FP buses A and B into LA and LB under control of the EAC field in the 
microcontrol (Table 3-13). The contents of LA and LB are loaded into CALU and DALU. CALU 
computes LA —- LB and DALU computes LB - LA. The carry-out signal from DALU selects either 
CALU or DALU as the positive exponent difference (SHF COUNT) to provide FPF control in the FAD. 


The contents of LA and LB, as well as XR (poly register), PR (product register), a normalization constant, 
and 80)6 are possible inputs to EALU. Input selection is controlled by both microcontrol and hardware. 
Refer to Table 3-14 for input selection summary. 


The EALU operation is controlled by the microcontrol field EALUC (refer to Table 3-15). The output of 
the EALU can be loaded into XR or PR for further processing, or loaded onto the FPA bus as a final 
answer. The XR and PR are loaded under control of the EAC microcontrol field [see Table 3-13 (bits 0 
and 1)]. The EALU output to FP bus A <14:07> is controlled by BSC microcontrol field (bus A EXP). 
Refer to the discussion of the BSC field in Section 3.2.2. The partial answers in XR and PR are reloaded 
into the EALU via AMUX and BMUX and are combined with either a normalization constant or +801¢ 
before they are loaded onto FPA <14:7> (see Table 3-14). The normalization constant, a variable 
quantity, adjusts the exponent for shifts required to normalize the FPF in the FAD. The actual normaliza- 
tion constant is read from a ROM rather than computed. The ROM is on the FNM. The 801¢ corrects for 
the offset that results in FPE add/subtract during exponent processing in MUL/DIV (see Sections 1.5 and 
1.6). 
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SALU = PP8 PLUS ACCM 7 PLUS STORED CARRYS FROM PP8 & ACCM 7 


Figure 3-22 MULL Control 
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Figure 3-23 Exponent Processor Block Diagram 
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Table 3-13. EAC Control Store Field 


EAC Fields 


Operation | Controls Controls Controls Controls 
LA > Bus A LB — Bus B PR— EALU XR- EALU 
Transfers Transfers Transfers Transfers 

Hex 

0 0 0 0 0 NOP 

l 0 0 0 l 

2 0 0 l 0 

3 0 0 l | 

4 0 l 0 0 

5 0 l 0 l 

6 0 I 1 0 

7 0 l l l 

8 l 0 0 0 

9 l 0 0 l 

A l 0 l 0 

B l 0 l l 

C l l 0 0 

D l l 0 | 

E l l l 0 

F l l l l 


NOTE 
Although the control field appears to be a 4-bit field, 
each bit of the 4 bits actually controls a single, inde- 
pendent function. 
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Table 3-14. EALU Input Control 
AM XC Fields 


a +. 


uCs uCs 

35 34 Operation 
LA to EALU Ainput 
LBto EALU A input 


PR to EALU A input 
Hardware select: For FP Add/Subtract, larger exponent (LA or LB)to EALUA 


——-Oo 


BM XC Fields 
] 


uCs uCs 

33 32 Operation 
Normalization constant to EALU B input 
XR to EALU B input 


8016 to EALU B input 
LBto EALU Binput 


Table 3-15 EALU Control Store Field 


EALU Fields 

| | 0 Control Signals Generated EALU Operation 
uCs a Required | Req Mode 

31 Carry Control S| So 


0 0 H (logic) Pass AINPUT 

0 l 0 L (arith) A-B 

l 0 l L (arith) A+B 

l l x H (logic) Force 1’s out 
(interpreted as 
underflow. This 
function is used 
to generate 
zeros on the 
buses. 


X = Don’t care 
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3.2.7 Sign Processor 

The sign processor, a section of the FCT, determines the sign of the FP operation result using both 
hardware and the microcontrol field SGNC (sign latch controls). Refer to Figure 3-24 and Tables 3-16 
and 3-17. This section receives information indicating the sign and magnitude of each operand, the desired 
operation (add, subtract, multiply, divide, poly), and the magnitude of the result. The resulting sign is 
placed on BUS FP A 15. 


SB 


FP BUS A <15>5 
INSTRUCTION 


FP BUS B <15>5 shai COMBINATORIAL 
sGni-2 LOGIC 


TO 


FP BUS AS IRC? 
<15> EALU3 
(OUTPUT) 
RESULT* 
NOTES 
1. FROM uCS SGN FIELD 4. INTERMEDIATE RESULTS 
2. FROM IB DETERMINES INSTRUCTION TYPE 5. SIGN OF OPERANDS 


3. DETERMINES IF RESULT IS ZERO OR NEGATIVE 


TK-0280 
Figure 3-24 Sign Processor Block Diagram 


3.2.8 Control Store and Logic 

As indicated in previous sections, the control store and logic, located on the FCT, provides the control 
signals for all FPA operations. These include both FPA internal operations (the transfer and manipulation 
of FP data) and external operations (interface between the FPA and CPU). See Figure 3-25. 


The FPA has two normal operating functions: instruction register decode (IRD) and performing an FPA 
instruction. The FPA normally alternates between these two functions. A third function, exceptional 
conditions, handles error conditions, traps, and interrupts. The FPA executes the third function whenever 
an exceptional condition is sensed. 


The FPA and the CPU run synchronously. This means that both have 133.3 ns microcycles divided into 
four time states (CPTO, CPT33.3, CPT66.6, CPT99.9) and TO CPU is simultaneous with TO FPA. Both 


load a new microword only at TO. 


The FPA always keeps two updated copies of the 16 CPU general (scratchpad) registers. These copies are 
used by the FPA to optimize register-mode instructions. These register copies are accessed and updated by 
the same lines that access and update the CPU registers themselves. To ensure that the FPA never reads a 
changing register, the CPU updates the general register set (and FPA copies) between T66.6 and T133.3 
(TO). The FPA reads the copies only between TO and T66.6. 
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Table 3-16 SGNC Control Store Field 
SGNC Field 


SGN SGN SGN 
C2 Cl CO Operation 
uCs uCs uCs Load into Load into 
07 06 05 SA SB 

0 

| 

0 

l 

0 

| 

0 

| 


SA (NOP) SB (NOP) 
FP bus A 15 SB (NOP) 


SA + Op Code SB (NOP) 
= SUB 


Result* SB (NOP) 
SA (NOP) FP bus B 15 
FP bus A 15 FP bus B15 
SB (NOP) 
SB (NOP) 


ee a) 
—= “a CO — 


* This is the resultant sign, determined by the op code, signs of the operands, the relative magnitude of the 
exponents, and the signs of the FALU. It can also be forced if a floating underflow or overflow occur. 


Table 3-17 Sign Processor Operation 


Sign of 
Relative Size Result 

Op Code of Exponents (FALU sign) Result* 
MULX SA © SB 
DIVX SA @ SB 
ADDX SA 
SUBX SA 
ADDX SB 
SUBX SB 
ADDX Positive SB 
ADDX Negative SB 
SUBX Positive SB 
SUBX Negative SB 


X = Don’t Care 


*Except for error — in case of overflow, the sign is forced to a 1 while underflow forces a 0. 
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Figure 3-25 Control Store and Logic Block Diagram 
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The FPA as a whole is directly controlled by the CPU. The CPU can enable and disable the FPA via bit 
15 of the FPA status register (ID bus register 17). The FPA is normally enabled by the CPU. 


The FPA is a microcontrolled unit containing 512 words by 48 bits of control store in ROM. Each word is 
divided into various length control fields with each field providing independent control of a particular 
section of the FPA. In general, these fields: 


1. Control the operation of the FPA data manipulation components, 
2. Coordinate the operation of the FPA with the operation of the CPU, and 


3. Initiate the operation of parts of the FPA control logic. 


Control of FPA operations is handled by accessing specific ROM words causing a particular set of FPA 
actions. 


3.2.8.1 IRD —- The IRD state is controlled by location IRD.1 in the control ROM. In this state a new 
microword is not read until STALL is disabled. ACC INSTR H and IB CALL from the CPU microword 
disable the STALL condition. When the FPA leaves IRD, the ACC ERROR bit in the status register is 
cleared if it was set during a previous cycle. The op code and specifier decode logic monitor the IRC OPC 
7:0 and specifier lines. The OPC lines enable ACC INSTR H when an FPA instruction is in the IB. The 
OPC lines are decoded to determine instruction type. The specifier decode lines determine specifier type. 
The output of this decode logic is transmitted to the next address logic. 
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Location IRD.1 controls all FPA operations in the IRD state. The operation assumed is a register-to- 
register operation. The FPA continually begins this operation without any indication that the next 
Operation is an R-R because it has both operands in its register set. If the next FPA operation is an R-R, 
both operands are already loaded. Location IRD.1 has MSC = 6 and the next address = 180. This 
information is transmitted to the next address logic and, along with the outputs of the op code and specifier 
decode logic, determines the correct next microaddress. 


In the next address logic (see Figure 3-26 and Table 3-18) the MSC = 6. The op code and specifier decode 
logic lines select the address offset to be ORed with next address (= 180) to select the next microaddress. 
MSC = 6 selects the A-Fork inputs from op code and specifier decode logic lines and transmits them 
through the A-B Fork multiplexer. This selects the correct offset based on instruction type, float or double, 
and specifiers 1 and 2. 


The offset is ORed with 180 and since STALL is no longer enabled (ACC INSR H is high), the next CPT 
O selects the correct microword to control the next FPA cycle. If the data is already in the FPA, an 
optimized routine is selected. 


3.2.8.2 Performing an FPA Instruction — Once an FPA instruction is sensed, the microcontrol words 
and the order in which they are selected is based on the operation desired, float or double, location of the 
operands, and relative size of the operands and result. 


The FPA first ensures that it has all the required data. If both operands are in registers, or one is in a 
register and the other is a short literal, all the data is in the FPA after the A-Fork test and the FPA 
transfers directly to the execution flows. If not, the first operand is fetched during A-Fork and then MSC 
= 7 and next address = 100 is transmitted to the next address logic. 


In the next address logic, MSC = 7 selects the B-Fork inputs from the op code and specifier decode, and 
transmits them through the A-B Fork multiplexer to be ORed with next address = 100. The offset selected 
depends on instruction type, double or float, and type of specifier 2. As before, if the data is already in the 
FPA, an optimized routine is selected; otherwise, the FPA waits for the CPU to fetch data. 


In some data transfers (A-Fork or B-Fork) the FPA must wait for data to be transmitted from the CPU 
via the ID bus. The microcode has a special WAIT bit to enable STALL for this purpose. The CPU 
indicates that the required data is on the ID bus by asserting CP SY NC. CP SYNC causes the data to be 


stored in the FPA and clears STALL, thereby enabling a new microword to be read and FPA operations to 
continue. 


Table 3-18 Next Address Lines 
Address Description 


Next Address Control Lines 


FCTK BEN 2:0 H From FPA control store selects lines to be monitored during 
execution flows. 

CS 71, 70 CPU accelerator control field 
00 - NOP 
01 - CPSYNC 


10-ACC TRAP - To 3-bit address specified by CPU USI field 
11 - REDEFINE USI 
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Table 3-18 Next Address Lines (Cont) 


Address 


Next Address Control Lines (Cont) 


CS) 7 456,95) 


FCTH ACC TRAP H 
FCTH FP TRAP L 


FCTH TRAP DIS L 
Next Address Selector Controls 
DEC uSI 


A-FORK B-FORK SELECT 
MUX 


NEXT ADDRESS MUX 


BEN MUX 
Address Lines 
FCTR CRADR 08:00 H 


FCTK NEXT ADR 08:00 


FCTH TRAP A 07:00 L to 
FCTF 


FMHR TRAP A 7:00H 


FCTH BRC 2:0 L 


A-B FORK ADR 


FCTF FLOAT H 


CS 57, 56, 55 


Description 


If CS71 and CS70 are high enabling DEC USI, a 6 on these lines 
enables POLY DONE, a 7 FP TRAP. 


High during accelerator trap, low otherwise. 
Low during FP trap, high otherwise. 


Low during either FP trap or accelerator trap, high otherwise. 


FCTH DEC uSI L enabled and CS 57, 56, and 55 high enable 
FCTH FP TRAP, otherwise it is high. 

Enable H causes all highs out and doesn’t affect next address. 
Enable L enables select input to select A-B data. 


Enable H causes all highs out. If enable is low, S low selects A 
input. 


Enable high causes all highs out. 
To control store selects address. Also can be transmitted to 
CPU via Reg 16 as current ADR. 


From control store next address from microword. 


Contains either trap address or next address. 

FP trap address from MAINT REG ID BUS. 

From branch enable MUX (BEN) monitors various FPA con- 
ditions and modifies the next address during execution flows 
based on BEN field in FPA microcode. 

(Not a signal name on prints) From A-FORK B-FORK select 
Mux. Monitors op code and specifier type from IB and modifies 
address in A-B forks. 


Based on op code. Used during A-B forks and by branch enable 
logic (BEN). 


Select trap address during ACC trap. Also refer to CS 57, 56, 55 
in control lines. 
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TRAP 


CONTROL 
cs SIGNALS 
DECODE NEXT ADDRESS 
(FROM CURRENT 
MICRO WORD) 
CS BUS ACC TRAP ADDRESS 


ID BUS MAINTENANCE 
REGISTER <16:23> FP TRAP ADDRESS (3) 


A OR B FORK DATA CONTROL 


NEXT 


NEXT 
ADDRESS ADDRESS 
SELECT 
A — 


ae att — A — B DATA (4) 


DECODE 


MSC =60R7? FLOAT? (1) 
BRANCH BEN 
ENABLE MUX BEN DATA (3) 
DATA 


Figure 3-26 Next Address Logic 
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Once the FPA has all required data, ACC OVERIDE is asserted. This signal, transmitted to CPU 
microaddress bit 12, causes the CPU to select microcode from FPA specialized microcode in the writable 
control store (WCS) rather than PCS. This prevents the CPU from beginning microcode floating-point 
routines (used when no FPA is present) to do FP instructions. The enabling of ACC OVERIDE is based on 
instruction type (IRC lines) and the execution point counter (IRC EP <2:0>). Note that since the FPA 
cannot fetch data itself, the data-fetch routines (CPU AFORK and BFORK) are allowed to continue until 
the FPA has all required data. 


When the FPA has all the data, the FPA execution flows are entered. These flows perform the manipula- 
tion required to A, S, M, and D. This includes unpacking and individually manipulating the FPF and FPE 
parts of the number, as well as checking the operands and results for unusual conditions (zeros, underflow, 
overflow, and so on). During execution flows the BEN field selects lines to be monitored and used to 
modify the next address. The 3-bit BEN field of each microword can select 3 of 24 possible lines to be 
ORed with the next address field of the microword to select the address. 
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The BEN multiplexer monitors signals from both the CPU and FPA. POLY DONE and CP SYNC are 
transmitted from the CPU using CS lines 71, 70, 57, 56, and 55. FLOAT, IRBRO L, and IRBR1 L are 
generated in the FPA but are summaries of op code information transmitted from the instruction buffer. 
All other BEN lines monitor FPA internal conditions. Refer to Table 3-19 for a summary of BEN fields. 
Finally the flows manipulate the result to ensure it is in correct form and inform the CPU via FP SYNC 
asserted that the answer is available. 


The CPU accepts the answer via DFMX bus drivers on the FNM using DAP ENA ACC D (1) and also 
reads the ACC Z, V, C, and N data lines to determine the condition codes of the answer. Once the CPU 
has the answer it transmits a CPSYNC and the FPA returns to its IRD state. 


Table 3-19 BEN Control Store Field 


BEN Operation 


Field BRC2L BRCIL BRCOL Summary 

0 NOP 

I FLOAT H* ~ JRBRIL* IRBROL* | Opcode decode 

2 SWR SWR SWR Shift within range 

3 RSVH BH A=0H Operand(s) equal zero 
Reserved operand 

4 POLY DN L* CPSYNCH* FLOAT* 

5 (A or B=0)H SUB*ED<2H COMPLH | Operand(s) equal zero 
Check exponent difference 

6 MUL/DIV | Multiply done 

DN H Division done 
7 PR8&H Error Condition 


*From the CPU. 


3.2.8.3 Exception Conditions — At any time during either IRD or instruction states, the CPU can direct 
the FPA to enter a trap routine for error recovery or microdiagnostics. The trap routines are located in the 
FPA’s own microcode. There are two separate sets of trap routines: ACC traps for CPU and FPA errors, 
and FP traps for microdiagnostics. Both trap routines are initiated via CS lines 71 and 70. 


If CS bus 71 is H and CS bus 70 is L, an ACC TRAP is initiated. An ACC TRAP addresses the FPA 
microcode location selected by CS bus lines 57, 56, and 55 (location 0-7). These traps are normally 
initiated for power-up and abort sequences. 


If CS bus 71, 70, 57, and 56 are high and 55 is low, an FP trap 1s initiated. The FP trap selects an 8-bit 
address previously stored in ID register 16 (the status register to access one of 256 addresses in the FPA 
microcode — location 0-255). These trap locations normally handle FPA microdiagnostics (refer to Figure 
3-26). 


3.3 FPA MICROCONTROL FIELDS 
This section summarizes all the fields in the FPA microcontrol word. Figure 3-27 shows the complete 
microcontrol word, all the fields, and the microcode mnemonics. Table 3-20 lists the function of each field. 
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47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 = 32 


SE aES | GES | ae 
NEXT ADDRESS BRANCH EALUA EALUB 
ENABLE INPUT INPUT 


31 30 29 28 27 26 25 24 23 #22 #21 #20 #19 += 18 17 16 


$+ | -——__ 
EALU MCTL EXPONENT MISCELLANEOUS SCRATCH 
CONTROL PROCESSOR CONTROLS PAD 
FP SYNC CONTROL = WAIT norm. CONTROL 
REGISTER 


15 14 13 #12 +11 +#«+10 O9 08 O07 06 05 04 03 O02 O1 00 


BUS A— BUSB FRACTION SIGN LATCH MULTIPLIER 
DATA SOURCE PROCESSOR CONTROL OPERAND 
CONTROL CONTROL 
REMAINDER 
REGISTER 
CONTROL 
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Figure 3-27 FPA Control Word Fields 


3.4 FPA MICROCODE STRUCTURE 

The FPA contains a 512 word by 48 bits (per word) memory. This memory provides microcontrol of the 
FPA during normal operation, and diagnostic programs for maintenance and troubleshooting. Approxi- 
mately 225 locations are for normal microcontrol and 200 locations contain diagnostic programs. The 
other locations are available for future use. 


The microcontrol code has an IRD state (instruction register decode) and three fork points (A, B, and C). 
The FPA remains in the IRD state until an FPA instruction is decoded. The FPA then enters A-Fork to 
receive the operands. If both operands are registers or short literals, optimized routines are entered and 
computation begins. Otherwise, B-Fork is entered. If the second operand is not register data, C-Fork is 
entered. Otherwise a B-Fork optimization is taken. Figure 3-28 shows the basic microcode structure and 
indicates the microcode starting addresses of the various routines. 


3.55 FPA INTERFACE FIRMWARE 


The CPU-FPA interaction is handled by specialized firmware located in the CPU’s joint control store 
(JCS). 
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47:39 (9 bits) 


38:36 (3 bits) 


35:34 (2 bits) 
33:32 (2 bits) 
31:30 (2 bits) 


29 (1 bit) 


28 (1 bit) 


27:24 (4 bits) 


23 (1 bit) 


22:20 (3 bits) 


19:18 (2 bits) 


17:16 (2 bits) 


15:12 © bits) 


11:8 (4 bits) 


7:5 (3 bits) 


4 (1 bit) 


3:0 (4 bits) 


Table 3-20 FPA Control Word Field Definitions 


NAD — Next Address 


BEN — Branch Enable 


AMXC — A Mux Control 


BMXC — B Mux Control 


EALUC — EALU Control 


FPSYNC — Floating-Point 
Synchronize 


MCTL — Multiply Control 


EAC — Exponent Processor 
Control 


WAIT — Wait 


MSC — Miscellaneous 
Control 


NRC — Normalization 
Register Control 


SCR — Scratchpad Control 


BSC — Bus A — Bus B 
Data Source 


FADC — Fraction 
Processor Controls 


SGNC — Sign Latch 
Controls 


LRR — Load Remainder 
Register 


OPLD — Operand Load 
(Multiplier Control) 


Contains the address of the next control word 
to be accessed. 


Selects signals to be used for next address 
calculations. 


Selects A input to FCT exponent ALU. 
Selects B input to FCT exponent ALU. 
Controls FCT exponent ALU operation. 


Transmits FPSYNC to CPU. 

Starts FML and FMH fraction multiply 
operation. 

Controls FCT (exponent processing). 
Controls FPA wait loop operation. Stalls until 
CPSYNC. 

Controls Miscellaneous FPA operations. 


Controls fraction normalize operation in FNM. 


Handles FPA General Register copies on FNM. 


Controls data transmission along FPA buses. 


Controls FAD fraction processing. 


Controls sign calculation on FCT. 


Controls remainder register (RR) on FNM. 


Loads fractions for multiplication on FML 
and FMH. 
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Figure 3-28 FPA Microcode Structure 
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This firmware handles numerous interface tasks. For ADD, SUBT, MUL, and DIV operations it accepts 
and stores the FPA results and condition codes, and handles any exceptions flagged by the FPA. In 3- 
operand op codes it calls specifier decoding microcode in the base machine to decode the third operand. It 
also handles the special requirements of the EMOD, MULL, and POLY commands. It is accessed when 
the FPA overrides the CPU address by forcing the uPC <12> to 1. This happens when the FPA detects an 
execution or optimization exit at a CPU A-Fork, B-Fork, or C-Fork for an FPA-implemented instruction. 


3.5.1 Major Interface Functions 

This firmware coordinates the interface between the CP microcode and the FP microcode. This includes 
the normal transfers of CPU data to the FPA, FPA results back to the proper register in the CPU, and 
various control signals for both normal and exception control. 


Table 3-21 lists important macro and micro orders that are used by the FPA interface firmware to 
generate and monitor the signals that are transferred between the CPU and FPA. 


3.5.2 Major Instruction Groups 
The FPA firmware can be broken into the following four groups of routines. 


e Generalized instructions handler 
e POLY handler 
e MULL handler 
e EMOD handler 


The first group handles all ADD, SUB, MUL, and DIV instructions as well as FPA exceptions. This group 
provides optimized flows for operands located in the general register set and literal operands. 


The POLY group transmits the polynomial coefficients to the FPA as they are needed, and transmits 
POLY DONE when the last coefficient has been transmitted. It also responds to the FPA detection of 
overflow, underflow, and coefficient reserved operand. Overflow and reserved operand detections cause a 
branch to exception condition routines in the base machine. If an underflow is noted, the firmware notes it 
and continues execution of the POLY flows. 


The MULL routine accepts the result of the longword integer multiplication from the FPA. Since the FPA 
creates an unsigned 64-bit product using 32-bit signed operands, the firmware must correct the result by 
subtracting out the effects of the negative signs on the magnitude result. To do this, the firmware stores 
the operands in a form that can later be used as subtrahend operands to correct the product and, based on 
this stored information, determines the correction sequence to select when the result is transmitted from 
the FPA. The firmware also creates the proper signed result, sets the condition codes, and tests for 
overflow. 


The FPA handles only the fraction multiply of the EMOD instructions. As a result, the EMOD firmware 
is relatively short. While the FPA is doing the fraction multiply this routine adds the exponents and checks 
for reserved operands, accepts the fraction multiply result from the FPA, checks for a zero result, and 
formats the FPA result so control can return to the EMOD routines in the base machine. 
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Name of Macro 


ID-D. SYNC 


D-ACCEL & 
SYNC 


Q-ACCEL & 
SYNC 


ACCEL?* 
(BEN/ACC<UB2, 
UB1, UBO>)+ 


POLY .DONE 


TRAP.ACC[1] 


MSC/LOAD. 
ACC.CCT 


Table 3-21 Interface Microcode 


Signal Monitored Data Transfer 
or Generated 


CP SYNC generated 


CP SYNC generated 


CP SYNC generated 


FP SYNC monitored 


ERR SYNC monitored 


Not Mull** generated 


POLY.DONE generated 


Accelerator Trap 


CPU > FPA 


FPA > CPU 


FPA > CPU 


FPA > CPU 


NO 


CPU > FPA 


Function 


Gates the CPU D-Regis- 
ter’s contents onto the ID 
bus. Generates CP SYNC. 
CP SYNC indicates that 
valid data is on bus. 


Gates data placed on 
DFMX Bus by FPA into D- 
Register. CP SYNC in- 
dicates that the FPA’s data 
has been accepted. 


Gates data placed on 
DFMX Bus by FPA into Q- 
Register. CP SYNC in- 
dicates that the FPA’s data 
has been accepted. 


ACC<UBO> = 1; Result 
data, on DFMX bus, and 
condition codes are being 
transmitted by FPA. If 
double precision condition 
codes are passed with first 
half. 


ACC<UBI> = 1; An ex- 
ception has been detected 
by the FPA. This initiates 
specialized routines that 
handle the exception. 


ACC<UB2> = 1; Sepa- 
rates MULL and MULF 


Indicates the last coefficient 
in the POLY operation, it 
being presented. In 
POLYD, used while both 
halves of the last coefficient 
are transmitted. 


Returns FPA microcode to 
IRD state 

Loads PSW<N,Z,V,C> 
with FPA generated condi- 
tion codes from CPU 
latches loaded in previous 
cycle. 


* This macro, in combination with the target constraint block, enables the CP microcode to test for various 


conditions. 


{ This is a microorder rather than a macro. 
** This is a condition rather than a specific signal. 
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